Re: A proof-of-concept for delta'ing repodata

2018-03-12 Thread Michal Domonkos
Hi Jonathan, To me, the zchunk idea looks good. Incidentally, for the last couple of months, I have been trying to rethink the way we cache metadata on the clients, as part of the libdnf (re)design efforts. My goal was to de-duplicate the data between similar repos in the cache as well as

Re: A proof-of-concept for delta'ing repodata

2018-02-16 Thread Jonathan Dieter
On Tue, 2018-02-13 at 10:52 +0100, Igor Gnatenko wrote: > What about zstd? Also in latest version of lz4 there is support for > dictionaries too. So I've investigated zstd, and, here are my results: Latest F27 primary.gz - 3.1MB zlib zchunk (including custom dict) primary.zck - 4.2MB ~35%

Re: A proof-of-concept for delta'ing repodata

2018-02-14 Thread Igor Gnatenko
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On Thu, 2018-02-15 at 08:35 +0200, Jonathan Dieter wrote: > On Wed, 2018-02-14 at 09:56 -0800, Kevin Fenzi wrote: > > ...snip... > > > > I think it sounds interesting, but you should get buyin from dnf folks > > and/or PackageKit folks and see if

Re: A proof-of-concept for delta'ing repodata

2018-02-14 Thread Jonathan Dieter
On Wed, 2018-02-14 at 09:56 -0800, Kevin Fenzi wrote: > ...snip... > > I think it sounds interesting, but you should get buyin from dnf folks > and/or PackageKit folks and see if they can agree to use this format. Do you know if there's a dedicated list for dnf or PackageKit development (a quick

Re: A proof-of-concept for delta'ing repodata

2018-02-14 Thread Kevin Fenzi
...snip... I think it sounds interesting, but you should get buyin from dnf folks and/or PackageKit folks and see if they can agree to use this format. I also agree just adding it as a new file while leaving the rest alone sounds good as a way to migrate only those things that know to look for

Re: A proof-of-concept for delta'ing repodata

2018-02-13 Thread Jonathan Dieter
On Tue, 2018-02-13 at 10:52 +0100, Igor Gnatenko wrote: > On Mon, 2018-02-12 at 23:53 +0200, Jonathan Dieter wrote: > > * Many changes to the metadata can mean a large number of ranges > >requested. I ran a check on our mirrors, and three (out of around > >150 that had the file I was

Re: A proof-of-concept for delta'ing repodata

2018-02-13 Thread Igor Gnatenko
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 CCing rpm-ecosystem@ ML since it's main location where this message should have went  On Mon, 2018-02-12 at 23:53 +0200, Jonathan Dieter wrote: > > I've come up with a method of splitting repodata into chunks that can > be downloaded and

Re: A proof-of-concept for delta'ing repodata

2018-02-12 Thread Jonathan Dieter
On Mon, 2018-02-12 at 23:53 +0200, Jonathan Dieter wrote: > > I've come up with a method of splitting repodata into chunks that can > be downloaded and combined with chunks that are already on the local > system to create a byte-for-byte copy of the compressed repodata. > Tools and