2000-10-12-15:05:55 Richard Crane:
> What does rsync do when copying a file that may be modified while it is
> being read?

Nothing especially brilliant.

On the _writing_ side, there's something clever that ends up solving
many though not all problems: the receiving rsync is building the
new version of the file using content shipped from the sender plus,
possibly, chunks of the original file. So the new file is built with
a tmp name, and only once it's written is it (atomically) moved into
place, replacing the old file. That's groovy for most purposes.

But if files are being modified while they are being read, rsync can
get badly confused, at least in principle. Best to avoid stressing
it.

rsync starts by doing some basic comparisons to decide whether the
full rsync algorithm needs to be invoked, doing things like
timestamps and lengths, and optionally whole-file checksums to
decide whether to sync or not. Once it's decided to sync, if the dst
side has a copy of the file, then block checksums are sent to the
sending side. Then the sending side scans along the file with a
rolling checksum, the real brilliant heart of the rsync algorithm,
looking for blocks that can be reused from what's already on the
receiving side, and sends a stream of "assembly instructions", some
with chunks of the new src file, some with references to blocks to
take from the old file on the dst end, to piece together into the
new dst file. Each side is therefore making multiple passes, and the
integrity of the rsync algorithm depends on the files not changing
between those passes.

Now whether rsync can be actually provoked into e.g. dumping core,
or erasing everything in sight, or siezing up your computer in a
spasm that makes it catch fire and burn your machine room down, I
don't know; it could be that all possible outcomes end up only
corrupting the file that's modified during the sync.

-Bennett

PGP signature

Reply via email to