On 17 May 2002, Wayne Davison <[EMAIL PROTECTED]> wrote: > On Fri, 17 May 2002, Allen, John L. wrote: > > In my humble opinion, this problem with rsync growing a huge memory > > footprint when large numbers of files are involved should be #1 on > > the list of things to fix. > > I have certainly been interested in working on this issue. I think it > might be time to implement a new algorithm, one that would let us > correct a number of flaws that have shown up in the current > approach.
(Only my opinion, all of this is debateable, etc. In particular, I have deep reservations about proposing a rewrite, because I know rewrites always seem attractive but rarely work out well. <http://www.joelonsoftware.com/articles/fog0000000348.html>) I've been thinking about this too. I think the top-level question is Start from scratch with a new protocol, or try to work within the current one? This largely determines whether we'll be able to implement a new algorithm or codebase, or need to evolve the current one. I think the nature of the current protocol is that it will be hard to make really fundamental improvements without rewriting it. rsync3.txt in CVS contains some ideas and features people have proposed for what a reimplementation. If we're going to change the protocol, I think it would be good to move to one that allows us to experiment with changing the implementation without breaking compatibility. You can see the way people have written very diverse implementations of HTTP or SMTP, but rsync doesn't really encourage that. Just one example of wanting flexibility in implementation is that having two processes at one end of the pipe has caused several problems in - making a native W32 port - various hangs on Linux - porting to VMS and other potential non-Unix systems I'm not saying that we shouldn't ever decide that forking on one end was a good solution, but rather than we shouldn't require it in the protocol. It would seem to make sense to do the first version with the traditional setup of one client and one daemon. Beyond that, I think there are a couple of things about the protocol we can be pretty sure about: - try to use constant memory regardless of tree size - try to use time & traffic proportional to deltas - no upfront tree traversal - pipelining I wrote librsync. There is some documentation and I can add more if there's anything undocumented. I haven't looked at pysync as much as it deserves, but it could be a good foundation. I think Tim said he'd written his own program, and there are also others around from which we might scrounge ideas or even code. -- Martin -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
