A recent email from Phil Howard prompted me to think about getting rsync
to use less memory for its file list.  Here's an early idea on how to
modify the protocol to not generate the file list entirely in advance.
Please feel free to poke holes in this if I'm going astray.

I envision abbreviating the initial (setup) phase between the two rsync
processes to just exchanging whatever exclude information needs to be
transferred.  The receiver would then fork off a generator (just as it
does now), but the generator would be depending on getting items to work
on from the receiver process via a pipe.  The sending side would fork
off a file-listing process that would only send data via a pipe to the
sending process (since the sending process would maintain control of the
socket).  The sender would then prime the pump by sending some number of
items read from the file-list process to the generator (through the
receiver).  At that point the sender would start reading the output from
the generator (doing its normal thing for each file) but alternating
file items for the generator in order to keep it busy.  Much of the code
would continue to refer to files by number (for protocol brevity), but I
envision using a simple "ack" message for each completed file to allow
the file list to be kept weeded out of finished items (the ack would go
from the receiver to the generator and get forwarded on to the sender).
Redo items would be sent down the same path from the receiver to the
generator, similar to how it is done now.

The tricky part is file deletions on the receiver side.  This protocol
would not allow a pre-transfer delete phase, but we should be able to
interleave deletions on a per-directory basis (in addition to saving
them all up to the end).  This is the part I haven't looked at enough
yet in order to have a firm idea of how it would be handled, but I
imagine having the generator scan each local directory when it is first
encountered, weeding down the list of local-files in that directory as
we encounter each item from the sender that goes in that directory, and
doing the incremental delete when each directory was done.

Comments?  Does something like this sound like it would work for a
future (post-2.4.7) release?

..wayne..


Reply via email to