Re: [gentoo-dev] Re: metadata/md5-cache
On Sun, 3 Jun 2012 09:48:26 + Robin H. Johnson robb...@gentoo.org wrote: On Sun, Jun 03, 2012 at 11:34:07AM +0200, Micha?? G??rny wrote: I means using separate proto for metadata, not necesarrily git. In any case, if it comes to transferring a lot of frequently-changing files, rsync is not that efficient... It does NOT send any of the intermediate states. But it does have to check all the files. Did I mention I'm not talking necessarily about git? Rather anything which would just lookup our timestamp, revision or whatever and just send what have changed, in a packed manner. -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] Re: metadata/md5-cache
On Mon, Jun 04, 2012 at 09:27:10AM +0200, Micha?? G??rny wrote: On Sun, 3 Jun 2012 09:48:26 + Robin H. Johnson robb...@gentoo.org wrote: On Sun, Jun 03, 2012 at 11:34:07AM +0200, Micha?? G??rny wrote: I means using separate proto for metadata, not necesarrily git. In any case, if it comes to transferring a lot of frequently-changing files, rsync is not that efficient... It does NOT send any of the intermediate states. But it does have to check all the files. Which is a pretty minimal cost in the grand scheme of things. You also need to figure out what 'efficiency' you're going to talk about here; network io, disk io, cpu io, etc. Most people in this case care about network IO; rsync's not perfect, but for reasons described below, it's the best of breed for the usage scenario. Did I mention I'm not talking necessarily about git? Git would be sanest if you were after this; it already does point to point delta transformations sanely. No point in reinventing a VCS; if you can't force the tree back to a known good state (aka, distributed VCS), you can't apply deltas to it, which case you need an rsync like algo. Rather anything which would just lookup our timestamp, revision or whatever and just send what have changed, in a packed manner. This would be reinventing git/VCS, or more likely, pretending that a timestamp file automatically means the repository is *unmodified*, and trying to do a point to point transformation on it. Where you're notion breaks down is that fun little bit about unmodified. This is why rsync is used; it's not limited to a point to point transformation, it's able to work from any starting point *efficiently*. Either way, suggest you do some research into this- including efficiencies of rsync, git, existing snapshot delta rsync machinery (tarsync, diffball, etc), study the trade offs inherint in each. Your initial email frankly reaks of NIH, hence my suggestions to go investigate what exists now. ~harring
Re: [gentoo-dev] Re: metadata/md5-cache
On Sun, Jun 03, 2012 at 09:25:43AM +, Robin H. Johnson wrote: On Sun, Jun 03, 2012 at 08:31:43AM +, Duncan wrote: Micha?? G??rny posted on Sun, 03 Jun 2012 09:22:04 +0200 as excerpted: Even if only the files metatdata changes, that still adds a significant cost to an rsync. I wonder when it will come to the point where git will be more efficient than rsync. Or maybe it would be already? Handwavey guess, but I've figured git to be more efficient client-side for some time. Server-side I don't know about, but I've presumed that's the reason the switch-to-git plans haven't included switching the default for user-syncs to git. I expect user/client side, git would be more efficient already, but as I said, that's handwavey guesses. No, the switch to git will NOT help users, it isn't more efficient. They will still be best served by rsync, for a couple of reasons: 1. metadata cache is NOT available in Git. Sidenote, and this is mildly insane, I'd thought about submodules for this; basically every rsync window, we dump the metadata into vcs, which devs can pull down and make use of. I've also not experimented w/ this workflow, so it could be batshit insane. Anyone game to experiment? ~harring
[gentoo-dev] Re: metadata/md5-cache
Michał Górny posted on Sun, 03 Jun 2012 09:22:04 +0200 as excerpted: Even if only the files metatdata changes, that still adds a significant cost to an rsync. I wonder when it will come to the point where git will be more efficient than rsync. Or maybe it would be already? Handwavey guess, but I've figured git to be more efficient client-side for some time. Server-side I don't know about, but I've presumed that's the reason the switch-to-git plans haven't included switching the default for user-syncs to git. I expect user/client side, git would be more efficient already, but as I said, that's handwavey guesses. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman
Re: [gentoo-dev] Re: metadata/md5-cache
On Sun, 3 Jun 2012 09:25:43 + Robin H. Johnson robb...@gentoo.org wrote: On Sun, Jun 03, 2012 at 08:31:43AM +, Duncan wrote: Micha?? G??rny posted on Sun, 03 Jun 2012 09:22:04 +0200 as excerpted: Even if only the files metatdata changes, that still adds a significant cost to an rsync. I wonder when it will come to the point where git will be more efficient than rsync. Or maybe it would be already? Handwavey guess, but I've figured git to be more efficient client-side for some time. Server-side I don't know about, but I've presumed that's the reason the switch-to-git plans haven't included switching the default for user-syncs to git. I expect user/client side, git would be more efficient already, but as I said, that's handwavey guesses. No, the switch to git will NOT help users, it isn't more efficient. They will still be best served by rsync, for a couple of reasons: 1. metadata cache is NOT available in Git. I means using separate proto for metadata, not necesarrily git. In any case, if it comes to transferring a lot of frequently-changing files, rsync is not that efficient... -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] Re: metadata/md5-cache
On Sun, Jun 03, 2012 at 11:34:07AM +0200, Micha?? G??rny wrote: I means using separate proto for metadata, not necesarrily git. In any case, if it comes to transferring a lot of frequently-changing files, rsync is not that efficient... It does NOT send any of the intermediate states. So the question is: Is the set of delta-compressed intermediate states A-B-C-D smaller than a compressed copy of just state D? -- Robin Hugh Johnson Gentoo Linux: Developer, Trustee Infrastructure Lead E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85