Re: [Rpm-ecosystem] Proposed zchunk file format

Jonathan Dieter Fri, 23 Feb 2018 13:16:12 -0800

On Fri, 2018-02-23 at 14:14 +0000, Michael Schroeder wrote:
> Hi Jonathan!
> 
> On Fri, Feb 16, 2018 at 08:52:23PM +0200, Jonathan Dieter wrote:
> > So here's my proposed file format for the zchunk file.  Should I
> > add
> > some flags to facilitate possible different compression formats?
> > 
> > +-+-+-+-+-+-+-+-+-+-+-+-+==================+=================+
> > >  ID   |  Index size   | Compressed Index | Compressed Dict |
> > 
> > +-+-+-+-+-+-+-+-+-+-+-+-+==================+=================+
> > 
> > +===========+===========+
> > >   Chunk   |   Chunk   | ==> More chunks
> > 
> > +===========+===========+
> > [...]
> 
> This may be an unfair question, but how does it compare to the
> 'gzip --rsyncable' + zsync approach that we (openSUSE) are
> using since almost eight years? I guess it's better, but how much?
> 
> Cheers,
>   Michael.


I've run some tests with zsync (since it's not in Fedora, I rebuilt the
latest Tumbleweed source rpm), but ran into problems (which is probably
unsurprising, given that upstream hasn't released an update in eight
years).

When testing the difference between two subsequent gzip --rsyncable
primary.xml's, zsync worked perfectly and only downloaded the 20k delta
(plus the 192k zsync control file).

When testing between two gzip --rsyncable primary.xml's that were about
four weeks apart, zsync was unable to build the new primary.xml, so I
was unable to get better numbers.

I do see zchunk as a new compression format that allows for easy deltas
as opposed to the add-on to existing files that zsync is.

Zsync also doesn't seem to support https, and uses crcs and MD4 hashes
to identify whether a block has changed, while I'd prefer SHA-256 or
better.

I do like the idea of using rsync's rolling sum to figure out where a
new chunk starts, and I'm going to see whether it might give us better
results than my current manual method.

Jonathan
_______________________________________________
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem

Re: [Rpm-ecosystem] Proposed zchunk file format

Reply via email to