Re: [Rpm-ecosystem] Proposal: Zchunked rpms to reduce compose time and eliminate need for deltarpms

2018-11-18 Thread Jonathan Dieter
Michael, thanks so much for your feedback! On Sun, 2018-11-18 at 14:09 +, Michael Schroeder wrote: > On Sat, Nov 17, 2018 at 06:10:56PM +0000, Jonathan Dieter wrote: > > In Fedora, there was a call for ideas on, among other things, reducing > > the compose time. Currently,

[Rpm-ecosystem] Proposal: Zchunked rpms to reduce compose time and eliminate need for deltarpms

2018-11-17 Thread Jonathan Dieter
In Fedora, there was a call for ideas on, among other things, reducing the compose time. Currently, a good chunk of Fedora's compose time is spent generating deltarpms, and I've been thinking about a way to use zchunk as rpm's compression payload, which would make deltarpms redundant. Neal sugges

Re: [Rpm-ecosystem] Some points about zchunk

2018-09-27 Thread Jonathan Dieter
On Thu, 2018-09-27 at 14:55 -0400, Neal Gompa wrote: > On Thu, Sep 27, 2018 at 2:45 PM Jonathan Dieter wrote: > > On Thu, 2018-09-27 at 14:17 -0400, Neal Gompa wrote: > > > DNF is now using libdnf, so you shouldn't need to repeat it twice. > > > > > > B

Re: [Rpm-ecosystem] Some points about zchunk

2018-09-27 Thread Jonathan Dieter
On Thu, 2018-09-27 at 14:17 -0400, Neal Gompa wrote: > On Thu, Sep 27, 2018 at 1:49 PM Jonathan Dieter wrote: > > Apologies that it's taken so long for me to follow up on this. So, > > I've been working on getting librepo and libdnf up-to-date with this > > cha

Re: [Rpm-ecosystem] Some points about zchunk

2018-09-27 Thread Jonathan Dieter
On Thu, 2018-08-09 at 10:49 +0200, Jonathan Dieter wrote: > On Sun, 2018-07-15 at 11:25 -0400, Neal Gompa wrote: > > On Thu, Jul 12, 2018 at 3:27 PM Jonathan Dieter > > wrote: > > > I'd go with _zck since is, by default, xml, but, other > > > than > >

[Rpm-ecosystem] Zchunk status update

2018-08-16 Thread Jonathan Dieter
So, after Flock, I figured it might be time for a zchunk status update. First off, there's no way we're going to have the zchunked metadata feature ready for Fedora 29, so we're pushing the feature back to Fedora 30. The one thing I *would* like to have ready for Fedora 29 is the actual metadata c

Re: [Rpm-ecosystem] Some points about zchunk

2018-08-09 Thread Jonathan Dieter
On Sun, 2018-07-15 at 11:25 -0400, Neal Gompa wrote: > On Thu, Jul 12, 2018 at 3:27 PM Jonathan Dieter wrote: > > I'd go with _zck since is, by default, xml, but, other than > > that, I think what you (and Michael) are suggesting makes sense. > > > > Michael, I

Re: [Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf

2018-08-07 Thread Jonathan Dieter
On Mon, 2018-08-06 at 16:36 +, Zbigniew Jędrzejewski-Szmek wrote: > Hi dnf and libsolv developers, > > this mail is a continuation of an FPC [1] and a FESCo [2] tickets. > > A proposal was made is to disallow packages in Fedora from using file > deps, and to optimize dnf to not load filelists

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-16 Thread Jonathan Dieter
On Sun, 2018-07-15 at 11:25 -0400, Neal Gompa wrote: > On Thu, Jul 12, 2018 at 3:27 PM Jonathan Dieter wrote: > > I'd go with _zck since is, by default, xml, but, other than > > that, I think what you (and Michael) are suggesting makes sense. > > > > Michael, I

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-12 Thread Jonathan Dieter
On Wed, 2018-07-11 at 07:55 -0400, Neal Gompa wrote: > On Wed, Jul 11, 2018 at 7:30 AM Michael Schroeder wrote: > > > > On Wed, Jul 11, 2018 at 12:23:47PM +0100, Jonathan Dieter wrote: > > > That's something I didn't think of, and you're absolutely righ

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-11 Thread Jonathan Dieter
On Wed, 2018-07-11 at 11:08 +, Michael Schroeder wrote: > On Wed, Jul 11, 2018 at 11:20:00AM +0100, Jonathan Dieter wrote: > > I must be missing something because I don't understand how that > > follows. As I understand it, dnf requests the primary metadata. > > Lib

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-11 Thread Jonathan Dieter
On Wed, 2018-07-11 at 08:28 +, Michael Schroeder wrote: > On Tue, Jul 10, 2018 at 02:05:26PM +0100, Jonathan Dieter wrote: > > The top-level tool only needs to deal with the uncompressed metadata. > > dnf/libdnf requests the primary metadata from librepo, which downloads

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-10 Thread Jonathan Dieter
On Tue, 2018-07-10 at 11:17 +, Michael Schroeder wrote: > On Mon, Jul 09, 2018 at 09:32:13PM +0100, Jonathan Dieter wrote: > > I had originally planned to do something along these lines (I think I > > used primary-zck rather than primary@zchunk), but realized that this > &g

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-09 Thread Jonathan Dieter
On Mon, 2018-07-09 at 08:59 +, Michael Schroeder wrote: > I tought about this a bit more over the weekend, and maybe we > should do this in a bit more general way. Basically zchunk is > just another compression format, like "xz" or "zstd". If we > want to support yet another compression format,

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-08 Thread Jonathan Dieter
On Sun, 2018-07-08 at 19:45 +0100, Jonathan Dieter wrote: > On Fri, 2018-07-06 at 11:48 +, Michael Schroeder wrote: > > Ah, no, I think you misunderstood. Do *not* add md5 support. In fact, > > I'd ask you to remove sha1 support as well to make your code smaller. > >

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-08 Thread Jonathan Dieter
On Sun, 2018-07-08 at 19:45 +0100, Jonathan Dieter wrote: > On Fri, 2018-07-06 at 11:48 +, Michael Schroeder wrote: > > On Thu, Jul 05, 2018 at 08:07:58PM +0300, Jonathan Dieter wrote: > > > librepo first downloads header-size of the file and then verifies that > &g

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-08 Thread Jonathan Dieter
On Fri, 2018-07-06 at 11:48 +, Michael Schroeder wrote: > On Thu, Jul 05, 2018 at 08:07:58PM +0300, Jonathan Dieter wrote: > > My plan was to just keep the same dictionaries (a different one for > > each metadata file) for at least a whole release, if not more. My > >

Re: [Rpm-ecosystem] Some points about zchunk

2018-07-05 Thread Jonathan Dieter
Michael, thank you so much for your detailed review! I really appreciate the time you took to look at this in such detail! I'm currently waiting to board a flight, so I'll make this brief and I'll probably be unavailable until Monday. Comments inline On Thu, 2018-07-05 at 14:18 +, Michael S

Re: [Rpm-ecosystem] Is there anything I can do to help zchunk reviews along?

2018-06-29 Thread Jonathan Dieter
On Fri, 2018-06-29 at 10:12 +0200, Vít Ondruch wrote: > Dne 28.6.2018 v 12:43 Jonathan Dieter napsal(a): > > I'd love to get these merged and properly tested in time for Fedora > > 29's change code complete deadline (August 28), but I'm not sure > > how > >

[Rpm-ecosystem] Is there anything I can do to help zchunk reviews along?

2018-06-28 Thread Jonathan Dieter
Pull requests to enable zchunk support in librepo, libsolv, dnf, libdnf and createrepo_c are at: https://github.com/rpm-software-management/librepo/pull/127 https://github.com/openSUSE/libsolv/pull/270 https://github.com/rpm-software-management/dnf/pull/1107 https://github.com/rpm-software-manage

Re: [Rpm-ecosystem] Patch review request: zchunk patches for dnf, libsolv and librepo

2018-06-13 Thread Jonathan Dieter
On Tue, 2018-06-12 at 12:21 +0300, Jonathan Dieter wrote: > I would love to get these changes into Fedora 29, and the code is > testable now, but with only three weeks until System-Wide change > proposals are due, I'm not sure if I'm being ambitious. FWIW, I have a COPR av

Re: [Rpm-ecosystem] Patch review request: zchunk patches for dnf, libsolv and librepo

2018-06-13 Thread Jonathan Dieter
On Tue, 2018-06-12 at 05:24 -0400, Neal Gompa wrote: > On Tue, Jun 12, 2018 at 5:21 AM Jonathan Dieter wrote: > > > > I've finally finished writing patches to integrate zchunk support into > > dnf/libsolv/librepo[1], and I'd greatly appreciate some code review.

Re: [Rpm-ecosystem] Patch review request: zchunk patches for dnf, libsolv and librepo

2018-06-12 Thread Jonathan Dieter
On Tue, 2018-06-12 at 05:24 -0400, Neal Gompa wrote: > On Tue, Jun 12, 2018 at 5:21 AM Jonathan Dieter wrote: > > > > I've finally finished writing patches to integrate zchunk support into > > dnf/libsolv/librepo[1], and I'd greatly appreciate some code review.

Re: [Rpm-ecosystem] Zchunk update

2018-06-12 Thread Jonathan Dieter
On Mon, 2018-05-07 at 13:35 -0500, Pat Riehecky wrote: > > On 04/16/2018 04:48 PM, Neal Gompa wrote: > > repomd.xml is being changed, > > Is there a link to the proposed new definition (dtd/xsd/rng)? I realized that you asked for this over a month ago, but here is the proposed DTD: https://www.

[Rpm-ecosystem] Patch review request: zchunk patches for dnf, libsolv and librepo

2018-06-12 Thread Jonathan Dieter
I've finally finished writing patches to integrate zchunk support into dnf/libsolv/librepo[1], and I'd greatly appreciate some code review. A vast majority of the code is in librepo, but libsolv has been expanded to support zchunk files and dnf has a tiny patch that passes the base cache directory

[Rpm-ecosystem] Librepo/dnf zchunk integration question

2018-05-31 Thread Jonathan Dieter
Zchunk works by comparing an old version of the file with the one you want to download, but when dnf refreshes a repository, it downloads the new file into a temporary directory with no information passed to the handle about where the old files are. I've been trying to keep my code changes in libs

Re: [Rpm-ecosystem] Zchunk update

2018-05-07 Thread Jonathan Dieter
On Mon, 2018-05-07 at 13:35 -0500, Pat Riehecky wrote: > On 04/16/2018 04:48 PM, Neal Gompa wrote: > > repomd.xml is being changed, > > Is there a link to the proposed new definition (dtd/xsd/rng)? Assuming you're asking about the zchunk-enabled repomd.xml, I'm still working on it. My original p

Re: [Rpm-ecosystem] Zchunk update

2018-04-29 Thread Jonathan Dieter
is here: https://copr.fedorainfracloud.org/coprs/jdieter/zchunk My next step is to add zchunk support to librepo. A quick summary of the features I wanted to add: On Mon, 2018-04-16 at 15:47 +0300, Jonathan Dieter wrote: > * A python API Still needs to be done. > * GPG signatures

Re: [Rpm-ecosystem] Zchunk update

2018-04-23 Thread Jonathan Dieter
On Mon, 2018-04-23 at 00:27 -0400, Neal Gompa wrote: > On Tue, Apr 17, 2018 at 3:05 PM, Jonathan Dieter wrote: > > I'm assuming that you're referring here to getting zchunk packaged into > > Fedora. I'd really like to finalize the file format (we're close,

Re: [Rpm-ecosystem] Zchunk update

2018-04-17 Thread Jonathan Dieter
On Tue, 2018-04-17 at 17:39 +0200, Michal Novotny wrote: > On Tue, Apr 17, 2018 at 4:20 PM, Jonathan Dieter wrote: > > On Tue, 2018-04-17 at 09:08 +0200, Michal Novotny wrote: > > > Hello Jonathan, > > > > > > Once it is in createrepo_c, we could try empl

Re: [Rpm-ecosystem] Zchunk update

2018-04-17 Thread Jonathan Dieter
On Tue, 2018-04-17 at 09:08 +0200, Michal Novotny wrote: > Hello Jonathan, > > On Mon, Apr 16, 2018 at 2:47 PM, Jonathan Dieter > wrote: > > It's been a number of weeks since my last update, so I thought I'd > > let > > everyone know where things are at

Re: [Rpm-ecosystem] Zchunk update

2018-04-16 Thread Jonathan Dieter
On Mon, 2018-04-16 at 09:00 -0400, Neal Gompa wrote: > On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter wrote: > > I've also added zchunk support to createrepo_c (see > > https://github.com/jdieter/createrepo_c), but I haven't yet created a > > pull request bec

[Rpm-ecosystem] Proposed zchunk file format - V4

2018-04-16 Thread Jonathan Dieter
Here's version four with a swap from fixed-length integers to variable- length compressed integers which allow us to skip compression of the index (since the non-integer data is all uncompressable checksums). I've also added the uncompressed size of each chunk to the index to make it easier to fig

[Rpm-ecosystem] Zchunk update

2018-04-16 Thread Jonathan Dieter
It's been a number of weeks since my last update, so I thought I'd let everyone know where things are at. I've spent most of these last few weeks reworking zchunk's API to make it easier to use and more in line with what other compression tools use, and I'm mostly happy with it now. Writing a sim

Re: [Rpm-ecosystem] Initial pre-alpha version of zchunk available for testing and comments

2018-03-22 Thread Jonathan Dieter
On Thu, 2018-03-22 at 11:55 +0200, Jonathan Dieter wrote: > I've got a working zchunk library, complete with some utilities at > https://github.com/jdieter/zchunk, but I wanted to get some feedback > before I went much further. It's only dependencies are libcurl and >

[Rpm-ecosystem] Initial pre-alpha version of zchunk available for testing and comments

2018-03-22 Thread Jonathan Dieter
I've got a working zchunk library, complete with some utilities at https://github.com/jdieter/zchunk, but I wanted to get some feedback before I went much further. It's only dependencies are libcurl and (optionally, but very heavily recommended) libzstd. There are test files in https://www.jdiete

Re: [Rpm-ecosystem] Proposed zchunk file format - V3

2018-03-12 Thread Jonathan Dieter
On Mon, 2018-03-12 at 15:42 +0100, Michal Domonkos wrote: > Hi Jonathan, > > To me, the zchunk idea looks good. > > Incidentally, for the last couple of months, I have been trying to > rethink the way we cache metadata on the clients, as part of the > libdnf (re)design efforts. My goal was to de-

Re: [Rpm-ecosystem] Proposed zchunk file format

2018-03-03 Thread Jonathan Dieter
On Fri, 2018-03-02 at 12:44 +, Michael Schroeder wrote: > On Fri, Mar 02, 2018 at 02:33:09PM +0200, Jonathan Dieter wrote: > > No, I didn't expect it to have much effect. Since openSUSE's xml > > file > > are (presumably) ordered so new packages com

Re: [Rpm-ecosystem] Proposed zchunk file format

2018-03-02 Thread Jonathan Dieter
On Thu, 2018-03-01 at 10:12 +, Michael Schroeder wrote: > On Wed, Feb 28, 2018 at 09:31:39AM +0200, Jonathan Dieter wrote: > > Ok, here are some numbers comparing zsync and zchunk. For testing, I > > have eight f27-updates primary.xml files dating from Dec 7 until Feb >

Re: [Rpm-ecosystem] A proof-of-concept for delta'ing repodata

2018-03-02 Thread Jonathan Dieter
On Fri, 2018-03-02 at 13:18 +0100, Vít Ondruch wrote: > Jonathan, > > Have you experimented with casync [1]? Yes, I did. Unfortunately, casync creates thousands of tiny files, which adds load to the mirrors and downloading even a few hundred takes significantly longer than using http ranges. Jo

Re: [Rpm-ecosystem] Proposed zchunk file format - V3

2018-02-28 Thread Jonathan Dieter
I've been working on a C implementation of this spec, and came up with a few other changes. I think it's important to have a checksum of the index as well as the data as we want to be able to verify that the index is as expected before trying to parse it. I've also added in the ability to use a d

Re: [Rpm-ecosystem] Proposed zchunk file format

2018-02-27 Thread Jonathan Dieter
On Fri, 2018-02-23 at 14:14 +, Michael Schroeder wrote: > This may be an unfair question, but how does it compare to the > 'gzip --rsyncable' + zsync approach that we (openSUSE) are > using since almost eight years? I guess it's better, but how much? Ok, here are some numbers comparing zsync a

Re: [Rpm-ecosystem] Proposed zchunk file format

2018-02-23 Thread Jonathan Dieter
On Fri, 2018-02-23 at 14:14 +, Michael Schroeder wrote: > Hi Jonathan! > > On Fri, Feb 16, 2018 at 08:52:23PM +0200, Jonathan Dieter wrote: > > So here's my proposed file format for the zchunk file. Should I > > add > > some flags to facilitate possib

Re: [Rpm-ecosystem] Proposed zchunk file format

2018-02-23 Thread Jonathan Dieter
On Fri, 2018-02-23 at 15:23 -0500, Colin Walters wrote: > And I don't see any zsync files in e.g.: > http://download.opensuse.org/distribution/leap/42.3/repo/oss/suse/ I found a copy of zsync at https://download.opensuse.org/repositories/n etwork/openSUSE_Tumbleweed/src/zsync-0.6.2-35.23.src.rpm

[Rpm-ecosystem] Proposed zchunk file format - V2

2018-02-19 Thread Jonathan Dieter
Neal, thanks for the feedback. After taking your comments into consideration, here's version 2. +-+-+-+-+-+--+-+-+-+-+-+-+-+-+ |ID | Compression type | Index size | +-+-+-+-+-+--+-+-+-+-+-+-+-+-+ +==+=+ | Compressed Index

[Rpm-ecosystem] Proposed zchunk file format

2018-02-16 Thread Jonathan Dieter
So here's my proposed file format for the zchunk file. Should I add some flags to facilitate possible different compression formats? +-+-+-+-+-+-+-+-+-+-+-+-+==+=+ | ID | Index size | Compressed Index | Compressed Dict | +-+-+-+-+-+-+-+-+-+-+-+-+=

Re: [Rpm-ecosystem] A proof-of-concept for delta'ing repodata

2018-02-16 Thread Jonathan Dieter
On Tue, 2018-02-13 at 10:52 +0100, Igor Gnatenko wrote: > What about zstd? Also in latest version of lz4 there is support for > dictionaries too. So I've investigated zstd, and, here are my results: Latest F27 primary.gz - 3.1MB zlib zchunk (including custom dict) primary.zck - 4.2MB ~35% increa

Re: [Rpm-ecosystem] A proof-of-concept for delta'ing repodata

2018-02-13 Thread Jonathan Dieter
On Tue, 2018-02-13 at 10:52 +0100, Igor Gnatenko wrote: > On Mon, 2018-02-12 at 23:53 +0200, Jonathan Dieter wrote: > > * Many changes to the metadata can mean a large number of ranges > >requested. I ran a check on our mirrors, and three (out of around > >150 th