Sync without crawling

Nathan Hruby Sun, 02 Nov 2008 08:55:51 -0800

On Sun, Nov 2, 2008 at 9:28 AM, Yves Dorfsman <[EMAIL PROTECTED]> wrote:
> Edward Ned Harvey wrote:
>>>
>>> http://code.google.com/p/lsyncd/
>>
>> Yup, that one fits the description.  It looks really cool!  :-)
>
> Hmmm... rsync is so efficient that I have to wonder what kind of extreme
> case would make this attractive. I'd be so afraid that one transaction get
> missed, and then because "notification" has been done, it would never get
> sync'ed again... That here and there, and over a long enough time period,
> you have two different file system.


I have the same fear as you which may be mitigated by using a more
robust notification and transmission system on top of inotify (eg:
amqp) though lsyncd looks like a winner too :)

I however can also see the utility of not using rsync in the case
where you're rsync'ing a fs with several million files -- rsync keeps
that filelist in memory while building it and you can hit some ungood
edge cases.  Additionally, rsync isn't exactly the right tool to use
for something closer to synchronous replication because of the thrash
it can cause on a high use fs.  I have

Other options that I can think of for a random "would like some kind
of replication thingy without thrashing my filesystem regularly"
thingy:
- Your SAN probably has a replication engine, use that (for vast
quantities of random unstructured end-user data, this is probably the
beast/easiest method)
- Chop up the rsync into smaller parts that can run in
parallel/different times/based on some other notifier
- Replicated backups (eg, stick to your normal backup routine,
clone/dupe/copy from the backup system)
- Append/update to a tar file, then sync that tarfile
- OS native-ish replication:
  - csync2 -> http://oss.linbit.com/csync2/
  - drbd -> http://www.drbd.org/
  - On Windows 2003R2 (and up) DFSr replicates based actions to the
NTFS journal and it much easier to use than FRS
  - FreeBSD has something called ggated (?)
- AFIK, most of the varied "cluster filesystems" aimed at the HPC
crowed also offer replication/duplication of data for additional
throughput/redundancy
- Don't do that: Put your data in a more structured container that has
replication (RDMBS, Hadoop, Hypertable)

I suspect that there's probably a different way to efficiently do
replication for every combination of data/OS/need out there, and a lot
of what one would need to do is look at the specifics of the situation
to figure out the best method.  Though, I guess that's true for
everything, isn't it?

> I have been using those scripts on my laptop, but I think eventually (once
> I've got the deletes worked out) I'll put that on all my machine, because
> then, it means that everybody can work while the server is down, it also
> means that I can suspend/hybernate desktop while not in use (hybernate and
> automount are NOT friends :-), etc...
>
> I've looked at CODA, but it was designed a long time ago, and does not work
> for today's sizes + their authentication mechanism is a headache.
>
> Anybody's been giving thought to this ?

Heh.  CODA.  Have you looked at unison?
http://www.cis.upenn.edu/~bcpierce/unison/  I suspect that its'
multi-way merge is probably closer to what you may want, if I
understand your use case correctly.

-n
-- 
-------------------------------------------
nathan hruby <[EMAIL PROTECTED]>
metaphysically wrinkle-free
-------------------------------------------
_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-tech] Live Sync / Backup / Sync without crawling

Reply via email to