On Fri, 2018-02-23 at 14:14 +0000, Michael Schroeder wrote: > Hi Jonathan! > > On Fri, Feb 16, 2018 at 08:52:23PM +0200, Jonathan Dieter wrote: > > So here's my proposed file format for the zchunk file. Should I > > add > > some flags to facilitate possible different compression formats? > > > > +-+-+-+-+-+-+-+-+-+-+-+-+==================+=================+ > > > ID | Index size | Compressed Index | Compressed Dict | > > > > +-+-+-+-+-+-+-+-+-+-+-+-+==================+=================+ > > > > +===========+===========+ > > > Chunk | Chunk | ==> More chunks > > > > +===========+===========+ > > [...] > > This may be an unfair question, but how does it compare to the > 'gzip --rsyncable' + zsync approach that we (openSUSE) are > using since almost eight years? I guess it's better, but how much? > > Cheers, > Michael.
I've run some tests with zsync (since it's not in Fedora, I rebuilt the latest Tumbleweed source rpm), but ran into problems (which is probably unsurprising, given that upstream hasn't released an update in eight years). When testing the difference between two subsequent gzip --rsyncable primary.xml's, zsync worked perfectly and only downloaded the 20k delta (plus the 192k zsync control file). When testing between two gzip --rsyncable primary.xml's that were about four weeks apart, zsync was unable to build the new primary.xml, so I was unable to get better numbers. I do see zchunk as a new compression format that allows for easy deltas as opposed to the add-on to existing files that zsync is. Zsync also doesn't seem to support https, and uses crcs and MD4 hashes to identify whether a block has changed, while I'd prefer SHA-256 or better. I do like the idea of using rsync's rolling sum to figure out where a new chunk starts, and I'm going to see whether it might give us better results than my current manual method. Jonathan _______________________________________________ Rpm-ecosystem mailing list Rpm-ecosystem@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-ecosystem