Re: [Rpm-ecosystem] Zchunk update

2018-04-16 Thread Neal Gompa
On Mon, Apr 16, 2018 at 12:32 PM, Jonathan Dieter  wrote:
> On Mon, 2018-04-16 at 09:00 -0400, Neal Gompa wrote:
>> On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter  wrote:
>> > I've also added zchunk support to createrepo_c (see
>> > https://github.com/jdieter/createrepo_c), but I haven't yet created a
>> > pull request because I'm not sure if my current implementation is the
>> > best method.  My current effort only zchunks primary.xml, filelists.xml
>> > and other.xml and doesn't change the sort order.
>> >
>>
>> Fedora COPR, Open Build Service, Mageia, and openSUSE also append
>> AppStream data to repodata to ship AppStream information. Is there a
>> way we can incorporate this into zck rpm-md? There's been an issue for
>> a while to support generating the AppStream metadata as part of the
>> createrepo_c run using the libappstream-builder library[1], which may
>> lend itself to doing this properly.
>
> Is it repomd.xml that actually gets changed or primary.xml /
> filelists.xml / other.xml?
>
> If it's repomd.xml, then it really shouldn't make any difference
> because I'm not currently zchunking it.  As far as I can see, the only
> reason to zchunk it would be to have an embedded GPG signature once
> they're supported in zchunk.
>

repomd.xml is being changed, so it should be fine, then. It'd be nice
to be able to chunk up AppStream data eventually, though.

>> > The one area of zchunk that still needs some API work is the download
>> > and chunk merge API, and I'm planning to clean that up as I add zchunk
>> > support to librepo.
>> >
>> > Some things I'd still like to add to zchunk:
>> >  * A python API
>> >  * GPG signatures in addition to (possibly replacing) overall data
>> >checksum
>>
>> I'd rather not lose checksums, but GPG signatures would definitely be
>> necessary, as openSUSE needs them, and we'd definitely like to have
>> them in Fedora[2], COPR[3], and Mageia[4].
>
> Fair enough.  Would we want zchunk to support multiple GPG signatures
> or is one enough?
>

Historically, we've used only one GPG key because that's what we do
with RPMs, but technically you can specify multiple keys in a .repo
file for Yum, DNF, and Zypper to use for validating packages and
metadata, so it's absolutely possible to have more. I'd probably
suggest if it's not too difficult, supporting multiple signatures.


-- 
真実はいつも一つ!/ Always, there's only one truth!
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: mirror failures

2018-04-16 Thread Stephen John Smoogen
On 16 April 2018 at 09:30, Miroslav Suchý  wrote:
> Can someone comment on this?
> I am not really sure if there are some issues on mirrors (not likely) or if 
> this is some issue with Fedora Cloud when
> network is overloaded (more likely).
>

It needs a lot more information to see what this is meaning.

1) Does it mean that the clients aren't able to build because they
can't find a valid mirror?
2) Does it mean that the user can't mirror stuff from copr?
3) Are there logs to help show exactly what is happening and when?


> Miroslav
>
>
>  Přeposlaná zpráva 
> Předmět: mirror failures
> Datum: Mon, 9 Apr 2018 09:38:04 +0200
> Od: Michal Novotny 
> Adresa pro odpověď: Community Projects 
> Komu: Cool Other Package Repositories 
>
> Hello,
>
> as of recent month or two I quite often encounter mirror sync failures
> on COPR builders during the build setup. Could you, please, confirm
> that this is really the case and link the build logs here if possible?
> I will then setup some extended monitoring if this proves to be true.
>
> Thank you
> clime
> ___
> copr-devel mailing list -- copr-de...@lists.fedorahosted.org
> To unsubscribe send an email to copr-devel-le...@lists.fedorahosted.org
> ___
> infrastructure mailing list -- infrastructure@lists.fedoraproject.org
> To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org



-- 
Stephen J Smoogen.
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: How To Contact Folks Interested In Tagger

2018-04-16 Thread Justin W. Flory
On 04/14/2018 10:15 PM, Elorm Buertey wrote:
> Hi. I have created the WhenIsGood survey but I'm not sure of the
> best way to send it to the others. I couldn't find their
> emails on their profiles. Would posting the link in the Github
> issue be okay?
> 

Hi Elorm,

Sharing the WhenIsGood link on the GitHub issue for the Fedora Tagger
development meeting works; I would also consider sharing it on the
Fedora Infrastructure mailing list since there are other people there
who may be interested.

You can subscribe here:


https://lists.fedoraproject.org/admin/lists/infrastructure.lists.fedoraproject.org/

I've included this response to the mailing list already, so you can
subscribe and then share the link on this thread for the Fedora Tagger
development meeting time.

Hope this helps!

-- 
Cheers,
Justin W. Flory
jflo...@gmail.com



signature.asc
Description: OpenPGP digital signature
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: Meeting agenda: RATS vs. librat

2018-04-16 Thread Randy Barlow
On 04/16/2018 10:49 AM, Randy Barlow wrote:
> Alternatively, we could discuss it on
> this mailing list if desired.

Perhaps it would be useful if I started the discussion here so we could
be more informed during Thursday's meeting.

pingou and I have been debating whether the project that allows
packagers to rerun failed tests should be a service or a library. I am
of the position that it should be a library.


RATS


RATS (Run Another Test Service) is a project that pingou has been
working on. It is a web service that allows API callers to ask for a
test to be run again. pingou, please do reply if I have misrepresented
or omitted anything; I will attempt to outline the benefits that pingou
argues for here.

* Being a service allows it to keep track of which tests have recently
  been requested for re-run, which allows it to make sure a single test
  doesn't get re-requested too many times.
* Being a service makes it easy to call from any language (Python, Ruby,
  etc.).
* Being a service makes it possible to update in one place and have all
  callers get the new behavior without themselves needing to update (as
  long as the API is kept stable).


librat
==

I would like to make the case for using a library (perhaps called librat
- lib run another test?) here. Benefits:

* A library creates less work for the infrastructure development and ops
  subteams. Adding a service means adding workload to a thinly stretched
  devops team, as a service needs monitoring and requires intervention
  when it goes down.

* A library is inherently more reliable than a service. Services must
  obviously contain the library code, but now also add network
  dependencies to projects that use them. It's one more piece in the
  system that can fail and bring down Bodhi.

* A library is significantly less code than a service. For example,
  libraries don't need to authenticate their callers, don't have to
  serialize/deserialize inputs and return values, and don't need to
  process human input (like config files). As stated, any service would
  need the library's code anyway, but will also need much more code to
  do all of the above. This also means less code to write tests and
  documentation for.

* A library is able to meet all of our known requirements, and is
  simple. I believe in the "keep it simple principle" - we should pick
  the simplest solution that meets the requirements.


I would like to address the benefits of RATS outlined above, one by one
here:

* RATS can keep track of which tests have been run to prevent too many
  re-runs

I believe that a library could do this locally for Bodhi and for Pagure
too once Pagure gets more test integration, by using a local cache.
Bodhi and Pagure will likely not gate on the same kinds of tests, and so
they don't need a central authority to make sure they each aren't
requesting the same test to be re-run. Furthermore, I think we don't
need to do a perfect job of making sure tests aren't re-run at all,
which is why I think having a library that caches recent re-run requests
will be "good enough" under the "perfect is the enemy of good" mantra.


* Calling from multiple languages

This is a theoretical requirement at this point - we don't have a real
use case for this at this time. Further, a library would make it easy to
write a CLI which can be used if there ever does become a requirement
for other languages. A REST API is not inherently easier to use than a
CLI - in fact, I would make the case that it is harder to use.


* Updating

While it is true that a service can be updated in one place such that
all callers get the update immediately, we do have a big ansible project
that can just as easily deploy a library update out to all the machines
that use it, so I would argue that we practically have the same benefit
with a library.


Thoughts?
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: [Rpm-ecosystem] Zchunk update

2018-04-16 Thread Jonathan Dieter
On Mon, 2018-04-16 at 09:00 -0400, Neal Gompa wrote:
> On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter  wrote:
> > I've also added zchunk support to createrepo_c (see
> > https://github.com/jdieter/createrepo_c), but I haven't yet created a
> > pull request because I'm not sure if my current implementation is the
> > best method.  My current effort only zchunks primary.xml, filelists.xml
> > and other.xml and doesn't change the sort order.
> > 
> 
> Fedora COPR, Open Build Service, Mageia, and openSUSE also append
> AppStream data to repodata to ship AppStream information. Is there a
> way we can incorporate this into zck rpm-md? There's been an issue for
> a while to support generating the AppStream metadata as part of the
> createrepo_c run using the libappstream-builder library[1], which may
> lend itself to doing this properly.

Is it repomd.xml that actually gets changed or primary.xml /
filelists.xml / other.xml?

If it's repomd.xml, then it really shouldn't make any difference
because I'm not currently zchunking it.  As far as I can see, the only
reason to zchunk it would be to have an embedded GPG signature once
they're supported in zchunk.

> > The one area of zchunk that still needs some API work is the download
> > and chunk merge API, and I'm planning to clean that up as I add zchunk
> > support to librepo.
> > 
> > Some things I'd still like to add to zchunk:
> >  * A python API
> >  * GPG signatures in addition to (possibly replacing) overall data
> >checksum
> 
> I'd rather not lose checksums, but GPG signatures would definitely be
> necessary, as openSUSE needs them, and we'd definitely like to have
> them in Fedora[2], COPR[3], and Mageia[4].

Fair enough.  Would we want zchunk to support multiple GPG signatures
or is one enough?

> >  * An expiry field? (I'm obviously thinking about signed repodata here)
> 
> Do we need an expiry field if we properly processed the key
> revocation/expiration in librepo? My understanding is that current
> hiccup with it is that we don't, and that the GPG keyring used in
> librepo is independent of the RPM keyring (which it shouldn't be).

Ah, that makes sense.  Forget that idea then.

Jonathan
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Meeting agenda: RATS vs. librat

2018-04-16 Thread Randy Barlow
Greetings!

I'd like to have a discussion during this week's infrastructure meeting
about whether we need the project that will re-run failed gating tests
to be a service, or whether a library will do. There has been some
lively debate about this in IRC, but it didn't seem like there was
resolution and it would be helpful if we could make a decision together
so we can proceed with implementation. Would this week's infra meeting
be a good time to discuss this? Alternatively, we could discuss it on
this mailing list if desired.
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: Planned MirrorManager changes

2018-04-16 Thread Adrian Reber
On Sat, Apr 14, 2018 at 04:28:37PM -0700, Kevin Fenzi wrote:
> > I would like to change the setup of our mirror crawler and just wanted
> > to mention my planned changes here before working on them.
> > 
> > Currently we have two VMs which are crawling our mirrors. Each of the
> > machine is responsible for one half of the active mirrors. The crawl
> > starts every 12 hours on the first crawler and 6 hours later on the
> > second crawler. So every 6 hours one crawler is accessing the database.
> > 
> > Currently most of the crawling time is not spent crawling but updating
> > the database about which host has which directory up to date. With a
> > timeout of 4 hours per host we are hitting that timeout on some hosts
> > regularly and most of the time the database access is the problem.
> > 
> > What I would like to change is to crawl each category (Fedora Linux,
> > Fedora Other, Fedora EPEL, Fedora Secondary Arches, Fedora Archive)
> > separately and at different times and intervals.
> > 
> > We would not hit the timeout as often as now as only the information for
> > a single category has to be updated. We could scan 'Fedora Archive' only
> > once per day or every second day. We can scan 'Fedora EPEL' much more
> > often as it is usually really fast and get better data about the
> > available mirrors.
> > 
> > My goal would be to distribute the scanning in such a way to decrease
> > the load on the database and to decrease the cases of mirror
> > auto-deactivation due to slow database accesses. 
> > 
> > Let me know if you think that these planned changes are the wrong
> > direction of if you have other ideas how to improve the mirror crawling.
> 
> Sounds like all great ideas to me. ;)

Thanks.

> I wonder if we could also find some way to note which mirrors have
> iso/image files, and could communicate this to the
> download.fedoraproject.org redirect to only redirect people to mirrors
> that have that specific file if they are pointing to an iso/qcow2, etc.

This is one of the cases where MirrorManager, in theory, should almost
handle it correctly. The important part of this sentence is 'in theory'.
MirrorManager should know about the 3 most recent files in a directory
and if we are crawling via rsync we even download the complete listing
for a mirror. So besides the theory it would help to see a wrong
redirect live to understand why it is happening.

Adrian


signature.asc
Description: PGP signature
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: Planned MirrorManager changes

2018-04-16 Thread Adrian Reber
On Sat, Apr 14, 2018 at 12:37:24AM +, Stephen John Smoogen wrote:
> On Fri, Apr 13, 2018 at 11:14 AM Adrian Reber  wrote:
> 
> > I would like to change the setup of our mirror crawler and just wanted
> > to mention my planned changes here before working on them.
> >
> > Currently we have two VMs which are crawling our mirrors. Each of the
> > machine is responsible for one half of the active mirrors. The crawl
> > starts every 12 hours on the first crawler and 6 hours later on the
> > second crawler. So every 6 hours one crawler is accessing the database.
> >
> > Currently most of the crawling time is not spent crawling but updating
> > the database about which host has which directory up to date. With a
> > timeout of 4 hours per host we are hitting that timeout on some hosts
> > regularly and most of the time the database access is the problem.
> >
> > What I would like to change is to crawl each category (Fedora Linux,
> > Fedora Other, Fedora EPEL, Fedora Secondary Arches, Fedora Archive)
> > separately and at different times and intervals.
> >
> > We would not hit the timeout as often as now as only the information for
> > a single category has to be updated. We could scan 'Fedora Archive' only
> > once per day or every second day. We can scan 'Fedora EPEL' much more
> > often as it is usually really fast and get better data about the
> > available mirrors.
> >
> > My goal would be to distribute the scanning in such a way to decrease
> > the load on the database and to decrease the cases of mirror
> > auto-deactivation due to slow database accesses.
> >
> > Let me know if you think that these planned changes are the wrong
> > direction of if you have other ideas how to improve the mirror crawling.
> 
> These look like a good way to deal with the fact that we have a lot of data
> and files and mirrors nd users get confused about how up to date they are.
> Would more VM’s help spread this out also?

From my point of view the main problem is the load MirrorManager creates
on the database. Currently I do not think that more VMs would help the
crawling. Someone once mentioned a dedicated database VM for
MirrorManager. That is something which could make a difference, but
first I would like to see if crawling per category can improve the
situation.

Adrian


signature.asc
Description: PGP signature
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Fwd: mirror failures

2018-04-16 Thread Miroslav Suchý
Can someone comment on this?
I am not really sure if there are some issues on mirrors (not likely) or if 
this is some issue with Fedora Cloud when
network is overloaded (more likely).

Miroslav


 Přeposlaná zpráva 
Předmět: mirror failures
Datum: Mon, 9 Apr 2018 09:38:04 +0200
Od: Michal Novotny 
Adresa pro odpověď: Community Projects 
Komu: Cool Other Package Repositories 

Hello,

as of recent month or two I quite often encounter mirror sync failures
on COPR builders during the build setup. Could you, please, confirm
that this is really the case and link the build logs here if possible?
I will then setup some extended monitoring if this proves to be true.

Thank you
clime
___
copr-devel mailing list -- copr-de...@lists.fedorahosted.org
To unsubscribe send an email to copr-devel-le...@lists.fedorahosted.org
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Re: [Rpm-ecosystem] Zchunk update

2018-04-16 Thread Neal Gompa
On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter  wrote:
> It's been a number of weeks since my last update, so I thought I'd let
> everyone know where things are at.
>
> I've spent most of these last few weeks reworking zchunk's API to make
> it easier to use and more in line with what other compression tools
> use, and I'm mostly happy with it now.  Writing a simple zchunk file
> can be done in a few lines of code, while reading one is also simple.
>
> I've also added zchunk support to createrepo_c (see
> https://github.com/jdieter/createrepo_c), but I haven't yet created a
> pull request because I'm not sure if my current implementation is the
> best method.  My current effort only zchunks primary.xml, filelists.xml
> and other.xml and doesn't change the sort order.
>

Fedora COPR, Open Build Service, Mageia, and openSUSE also append
AppStream data to repodata to ship AppStream information. Is there a
way we can incorporate this into zck rpm-md? There's been an issue for
a while to support generating the AppStream metadata as part of the
createrepo_c run using the libappstream-builder library[1], which may
lend itself to doing this properly.

[1]: https://github.com/rpm-software-management/createrepo_c/issues/75

> The one area of zchunk that still needs some API work is the download
> and chunk merge API, and I'm planning to clean that up as I add zchunk
> support to librepo.
>
> Some things I'd still like to add to zchunk:
>  * A python API
>  * GPG signatures in addition to (possibly replacing) overall data
>checksum

I'd rather not lose checksums, but GPG signatures would definitely be
necessary, as openSUSE needs them, and we'd definitely like to have
them in Fedora[2], COPR[3], and Mageia[4].

[2]: https://pagure.io/releng/issue/133
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1373331
[4]: https://bugs.mageia.org/show_bug.cgi?id=19432

>  * An expiry field? (I'm obviously thinking about signed repodata here)

Do we need an expiry field if we properly processed the key
revocation/expiration in librepo? My understanding is that current
hiccup with it is that we don't, and that the GPG keyring used in
librepo is independent of the RPM keyring (which it shouldn't be).


-- 
真実はいつも一つ!/ Always, there's only one truth!
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Proposed zchunk file format - V4

2018-04-16 Thread Jonathan Dieter
Here's version four with a swap from fixed-length integers to variable-
length compressed integers which allow us to skip compression of the
index (since the non-integer data is all uncompressable checksums). 
I've also added the uncompressed size of each chunk to the index to
make it easier to figure out how much space to allocate for the
uncompressed chunk.

+-+-+-+-+-++=++
|   ID| Checksum type (ci) | Header checksum | Compression type (ci ) |
+-+-+-+-+-++=++

+=+===+=+
| Index size (ci) | Index | Compressed Dict |
+=+===+=+

+===+===+
|   Chunk   |   Chunk   | ==> More chunks
+===+===+

(ci)
 Compressed (unsigned) integer - An variable length little endian
 integer where the first seven bits of the number are stored in the
 first byte, followed by the next seven bits in the next byte, and so
 on.  The top bit of all bytes except the final byte must be zero, and
 the top bit of the final byte must be one, indicating the end of the
 number.

ID
 '\0ZCK1', identifies file as zchunk version 1 file

Checksum type
 This is an 8-bit unsigned integer containing the type of checksum
 used to generate the header checksum and the total data checksum, but
 *not* the chunk checksums.

 Current values:
   0 = SHA-1
   1 = SHA-256

Header checksum
 This is the checksum of everything from the beginning of the file
 until the end of the index when the header checksum is all \0's.

Compression type
 This is an integer containing the type of compression used to
 compress dict and chunks.

 Current values:
   0 - Uncompressed
   2 - zstd

Index size
 This is an integer containing the size of the index.

Index
 This is the index, which is described in the next section.

Compressed Dict (optional)
 This is a custom dictionary used when compressing each chunk.
 Because each chunk is compressed completely separately from the
 others, the custom dictionary gives us much better overall
 compression.  The custom dictionary is compressed without a custom
 dictionary (for obvious reasons).

Chunk
 This is a chunk of data, compressed with the custom dictionary
 provided above.


The index:

+==+==+===+
| Chunk checksum type (ci) | Chunk count (ci) | Data checksum |
+==+==+===+

+===+==+===+
| Dict checksum | Dict length (ci) | Uncompressed dict length (ci) |
+===+==+===+

++===+==+
| Chunk checksum | Chunk length (ci) | Uncompressed length (ci) | ...
++===+==+

Chunk checksum type
 This is an integer containing the type of checksum used to generate
 the chunk checksums.

 Current values:
   0 = SHA-1
   1 = SHA-256

Chunk count
 This is a count of the number of chunks in the zchunk file.

Checksum of all data
 This is the checksum of everything after the index, including the
 compressed dict and all the compressed chunks.  This checksum is
 generated using the overall checksum type, *not* the chunk checksum
 type.

Dict checksum
 This is the checksum of the compressed dict, used to detect whether
 two dicts are identical.  If there is no dict, the checksum must be
 all zeros.

Dict length
 This is an integer containing the length of the dict.  If there is no
 dict, this must be a zero.

Uncompressed dict length
 This is an integer containing the length of the dict after it has
 been decompressed.  If there is no dict, this must be a zero.

Chunk checksum
 This is the checksum of the compressed chunk, used to detect whether
 any two chunks are identical.

Chunk length
 This is an integer containing the length of the chunk.

Uncompressed dict length
 This is an integer containing the length of the chunk after it has
 been decompressed.

The index is designed to be able to be extracted from the file on the
server and downloaded separately, to facilitate downloading only the
parts of the file that are needed, but must then be re-embedded when
assembling the file so the user only needs to keep one file.
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org


Zchunk update

2018-04-16 Thread Jonathan Dieter
It's been a number of weeks since my last update, so I thought I'd let
everyone know where things are at.

I've spent most of these last few weeks reworking zchunk's API to make
it easier to use and more in line with what other compression tools
use, and I'm mostly happy with it now.  Writing a simple zchunk file
can be done in a few lines of code, while reading one is also simple.

I've also added zchunk support to createrepo_c (see 
https://github.com/jdieter/createrepo_c), but I haven't yet created a
pull request because I'm not sure if my current implementation is the
best method.  My current effort only zchunks primary.xml, filelists.xml
and other.xml and doesn't change the sort order.

The one area of zchunk that still needs some API work is the download
and chunk merge API, and I'm planning to clean that up as I add zchunk
support to librepo.

Some things I'd still like to add to zchunk:
 * A python API
 * GPG signatures in addition to (possibly replacing) overall data
   checksum
 * An expiry field? (I'm obviously thinking about signed repodata here)
 * Tests
 * More tests
 * Other arch testing (it's currently only tested on x86_64)

I'd welcome any feedback or flames.

Jonathan
___
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org