Re: [Rpm-ecosystem] Zchunk update

2018-04-23 Thread Jonathan Dieter
On Mon, 2018-04-23 at 00:27 -0400, Neal Gompa wrote:
> On Tue, Apr 17, 2018 at 3:05 PM, Jonathan Dieter  wrote:
> > I'm assuming that you're referring here to getting zchunk packaged into
> > Fedora.  I'd really like to finalize the file format (we're close, but
> > I still need a good way of storing signatures in it) and the download
> > API before releasing it into Fedora proper.
> > 
> 
> I'm looking forward to this!

I've updated the file format to allow for multiple signatures, updated
the zchunk code to recognize the existence of a signature (while still
not checking it), and have released as zchunk-0.3.0 in COPR.  I've also
added in 32-bits of flags that we can use to extend the format in a
backwards-compatible way.

The current zchunk format description is at:
https://github.com/jdieter/zchunk/blob/master/zchunk_format.txt

> I would recommend using the dicts mentioned above as they give me over
> > 40% space savings for both other.xml.zck and primary.xml.zck.  Do
> > please let me know if you run into any problems.
> > 
> 
> Are those dictionaries Fedora specific? If so, how can other
> distributions generate similar ones? If not, still, how were they
> made? :)

They were generated from Fedora metadata, but they should help with any
distribution's repodata.  I generated them by splitting a few day's
worth of metadata along package boundaries, stripping out any
checksums, and then running zstd --train * on the directory containing
the split metadata.  The script I used is available at
https://www.jdieter.net/downloads/zchunk-dicts/split.py, and I hope to
write up proper instructions at some point.

Jonathan
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: [Rpm-ecosystem] Zchunk update

2018-04-16 Thread Neal Gompa
On Mon, Apr 16, 2018 at 12:32 PM, Jonathan Dieter  wrote:
> On Mon, 2018-04-16 at 09:00 -0400, Neal Gompa wrote:
>> On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter  wrote:
>> > I've also added zchunk support to createrepo_c (see
>> > https://github.com/jdieter/createrepo_c), but I haven't yet created a
>> > pull request because I'm not sure if my current implementation is the
>> > best method.  My current effort only zchunks primary.xml, filelists.xml
>> > and other.xml and doesn't change the sort order.
>> >
>>
>> Fedora COPR, Open Build Service, Mageia, and openSUSE also append
>> AppStream data to repodata to ship AppStream information. Is there a
>> way we can incorporate this into zck rpm-md? There's been an issue for
>> a while to support generating the AppStream metadata as part of the
>> createrepo_c run using the libappstream-builder library[1], which may
>> lend itself to doing this properly.
>
> Is it repomd.xml that actually gets changed or primary.xml /
> filelists.xml / other.xml?
>
> If it's repomd.xml, then it really shouldn't make any difference
> because I'm not currently zchunking it.  As far as I can see, the only
> reason to zchunk it would be to have an embedded GPG signature once
> they're supported in zchunk.
>

repomd.xml is being changed, so it should be fine, then. It'd be nice
to be able to chunk up AppStream data eventually, though.

>> > The one area of zchunk that still needs some API work is the download
>> > and chunk merge API, and I'm planning to clean that up as I add zchunk
>> > support to librepo.
>> >
>> > Some things I'd still like to add to zchunk:
>> >  * A python API
>> >  * GPG signatures in addition to (possibly replacing) overall data
>> >checksum
>>
>> I'd rather not lose checksums, but GPG signatures would definitely be
>> necessary, as openSUSE needs them, and we'd definitely like to have
>> them in Fedora[2], COPR[3], and Mageia[4].
>
> Fair enough.  Would we want zchunk to support multiple GPG signatures
> or is one enough?
>

Historically, we've used only one GPG key because that's what we do
with RPMs, but technically you can specify multiple keys in a .repo
file for Yum, DNF, and Zypper to use for validating packages and
metadata, so it's absolutely possible to have more. I'd probably
suggest if it's not too difficult, supporting multiple signatures.


-- 
真実はいつも一つ!/ Always, there's only one truth!
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: [Rpm-ecosystem] Zchunk update

2018-04-16 Thread Jonathan Dieter
On Mon, 2018-04-16 at 09:00 -0400, Neal Gompa wrote:
> On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter  wrote:
> > I've also added zchunk support to createrepo_c (see
> > https://github.com/jdieter/createrepo_c), but I haven't yet created a
> > pull request because I'm not sure if my current implementation is the
> > best method.  My current effort only zchunks primary.xml, filelists.xml
> > and other.xml and doesn't change the sort order.
> > 
> 
> Fedora COPR, Open Build Service, Mageia, and openSUSE also append
> AppStream data to repodata to ship AppStream information. Is there a
> way we can incorporate this into zck rpm-md? There's been an issue for
> a while to support generating the AppStream metadata as part of the
> createrepo_c run using the libappstream-builder library[1], which may
> lend itself to doing this properly.

Is it repomd.xml that actually gets changed or primary.xml /
filelists.xml / other.xml?

If it's repomd.xml, then it really shouldn't make any difference
because I'm not currently zchunking it.  As far as I can see, the only
reason to zchunk it would be to have an embedded GPG signature once
they're supported in zchunk.

> > The one area of zchunk that still needs some API work is the download
> > and chunk merge API, and I'm planning to clean that up as I add zchunk
> > support to librepo.
> > 
> > Some things I'd still like to add to zchunk:
> >  * A python API
> >  * GPG signatures in addition to (possibly replacing) overall data
> >checksum
> 
> I'd rather not lose checksums, but GPG signatures would definitely be
> necessary, as openSUSE needs them, and we'd definitely like to have
> them in Fedora[2], COPR[3], and Mageia[4].

Fair enough.  Would we want zchunk to support multiple GPG signatures
or is one enough?

> >  * An expiry field? (I'm obviously thinking about signed repodata here)
> 
> Do we need an expiry field if we properly processed the key
> revocation/expiration in librepo? My understanding is that current
> hiccup with it is that we don't, and that the GPG keyring used in
> librepo is independent of the RPM keyring (which it shouldn't be).

Ah, that makes sense.  Forget that idea then.

Jonathan
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: [Rpm-ecosystem] Zchunk update

2018-04-16 Thread Neal Gompa
On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter  wrote:
> It's been a number of weeks since my last update, so I thought I'd let
> everyone know where things are at.
>
> I've spent most of these last few weeks reworking zchunk's API to make
> it easier to use and more in line with what other compression tools
> use, and I'm mostly happy with it now.  Writing a simple zchunk file
> can be done in a few lines of code, while reading one is also simple.
>
> I've also added zchunk support to createrepo_c (see
> https://github.com/jdieter/createrepo_c), but I haven't yet created a
> pull request because I'm not sure if my current implementation is the
> best method.  My current effort only zchunks primary.xml, filelists.xml
> and other.xml and doesn't change the sort order.
>

Fedora COPR, Open Build Service, Mageia, and openSUSE also append
AppStream data to repodata to ship AppStream information. Is there a
way we can incorporate this into zck rpm-md? There's been an issue for
a while to support generating the AppStream metadata as part of the
createrepo_c run using the libappstream-builder library[1], which may
lend itself to doing this properly.

[1]: https://github.com/rpm-software-management/createrepo_c/issues/75

> The one area of zchunk that still needs some API work is the download
> and chunk merge API, and I'm planning to clean that up as I add zchunk
> support to librepo.
>
> Some things I'd still like to add to zchunk:
>  * A python API
>  * GPG signatures in addition to (possibly replacing) overall data
>checksum

I'd rather not lose checksums, but GPG signatures would definitely be
necessary, as openSUSE needs them, and we'd definitely like to have
them in Fedora[2], COPR[3], and Mageia[4].

[2]: https://pagure.io/releng/issue/133
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1373331
[4]: https://bugs.mageia.org/show_bug.cgi?id=19432

>  * An expiry field? (I'm obviously thinking about signed repodata here)

Do we need an expiry field if we properly processed the key
revocation/expiration in librepo? My understanding is that current
hiccup with it is that we don't, and that the GPG keyring used in
librepo is independent of the RPM keyring (which it shouldn't be).


-- 
真実はいつも一つ!/ Always, there's only one truth!
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org