Re: [Rpm-ecosystem] Zchunk update
On Mon, Apr 16, 2018 at 12:32 PM, Jonathan Dieter wrote: > On Mon, 2018-04-16 at 09:00 -0400, Neal Gompa wrote: >> On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter wrote: >> > I've also added zchunk support to createrepo_c (see >> > https://github.com/jdieter/createrepo_c), but I haven't yet created a >> > pull request because I'm not sure if my current implementation is the >> > best method. My current effort only zchunks primary.xml, filelists.xml >> > and other.xml and doesn't change the sort order. >> > >> >> Fedora COPR, Open Build Service, Mageia, and openSUSE also append >> AppStream data to repodata to ship AppStream information. Is there a >> way we can incorporate this into zck rpm-md? There's been an issue for >> a while to support generating the AppStream metadata as part of the >> createrepo_c run using the libappstream-builder library[1], which may >> lend itself to doing this properly. > > Is it repomd.xml that actually gets changed or primary.xml / > filelists.xml / other.xml? > > If it's repomd.xml, then it really shouldn't make any difference > because I'm not currently zchunking it. As far as I can see, the only > reason to zchunk it would be to have an embedded GPG signature once > they're supported in zchunk. > repomd.xml is being changed, so it should be fine, then. It'd be nice to be able to chunk up AppStream data eventually, though. >> > The one area of zchunk that still needs some API work is the download >> > and chunk merge API, and I'm planning to clean that up as I add zchunk >> > support to librepo. >> > >> > Some things I'd still like to add to zchunk: >> > * A python API >> > * GPG signatures in addition to (possibly replacing) overall data >> >checksum >> >> I'd rather not lose checksums, but GPG signatures would definitely be >> necessary, as openSUSE needs them, and we'd definitely like to have >> them in Fedora[2], COPR[3], and Mageia[4]. > > Fair enough. Would we want zchunk to support multiple GPG signatures > or is one enough? > Historically, we've used only one GPG key because that's what we do with RPMs, but technically you can specify multiple keys in a .repo file for Yum, DNF, and Zypper to use for validating packages and metadata, so it's absolutely possible to have more. I'd probably suggest if it's not too difficult, supporting multiple signatures. -- 真実はいつも一つ!/ Always, there's only one truth! ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Re: mirror failures
On 16 April 2018 at 09:30, Miroslav Suchý wrote: > Can someone comment on this? > I am not really sure if there are some issues on mirrors (not likely) or if > this is some issue with Fedora Cloud when > network is overloaded (more likely). > It needs a lot more information to see what this is meaning. 1) Does it mean that the clients aren't able to build because they can't find a valid mirror? 2) Does it mean that the user can't mirror stuff from copr? 3) Are there logs to help show exactly what is happening and when? > Miroslav > > > Přeposlaná zpráva > Předmět: mirror failures > Datum: Mon, 9 Apr 2018 09:38:04 +0200 > Od: Michal Novotny > Adresa pro odpověď: Community Projects > Komu: Cool Other Package Repositories > > Hello, > > as of recent month or two I quite often encounter mirror sync failures > on COPR builders during the build setup. Could you, please, confirm > that this is really the case and link the build logs here if possible? > I will then setup some extended monitoring if this proves to be true. > > Thank you > clime > ___ > copr-devel mailing list -- copr-de...@lists.fedorahosted.org > To unsubscribe send an email to copr-devel-le...@lists.fedorahosted.org > ___ > infrastructure mailing list -- infrastructure@lists.fedoraproject.org > To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org -- Stephen J Smoogen. ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Re: How To Contact Folks Interested In Tagger
On 04/14/2018 10:15 PM, Elorm Buertey wrote: > Hi. I have created the WhenIsGood survey but I'm not sure of the > best way to send it to the others. I couldn't find their > emails on their profiles. Would posting the link in the Github > issue be okay? > Hi Elorm, Sharing the WhenIsGood link on the GitHub issue for the Fedora Tagger development meeting works; I would also consider sharing it on the Fedora Infrastructure mailing list since there are other people there who may be interested. You can subscribe here: https://lists.fedoraproject.org/admin/lists/infrastructure.lists.fedoraproject.org/ I've included this response to the mailing list already, so you can subscribe and then share the link on this thread for the Fedora Tagger development meeting time. Hope this helps! -- Cheers, Justin W. Flory jflo...@gmail.com signature.asc Description: OpenPGP digital signature ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Re: Meeting agenda: RATS vs. librat
On 04/16/2018 10:49 AM, Randy Barlow wrote: > Alternatively, we could discuss it on > this mailing list if desired. Perhaps it would be useful if I started the discussion here so we could be more informed during Thursday's meeting. pingou and I have been debating whether the project that allows packagers to rerun failed tests should be a service or a library. I am of the position that it should be a library. RATS RATS (Run Another Test Service) is a project that pingou has been working on. It is a web service that allows API callers to ask for a test to be run again. pingou, please do reply if I have misrepresented or omitted anything; I will attempt to outline the benefits that pingou argues for here. * Being a service allows it to keep track of which tests have recently been requested for re-run, which allows it to make sure a single test doesn't get re-requested too many times. * Being a service makes it easy to call from any language (Python, Ruby, etc.). * Being a service makes it possible to update in one place and have all callers get the new behavior without themselves needing to update (as long as the API is kept stable). librat == I would like to make the case for using a library (perhaps called librat - lib run another test?) here. Benefits: * A library creates less work for the infrastructure development and ops subteams. Adding a service means adding workload to a thinly stretched devops team, as a service needs monitoring and requires intervention when it goes down. * A library is inherently more reliable than a service. Services must obviously contain the library code, but now also add network dependencies to projects that use them. It's one more piece in the system that can fail and bring down Bodhi. * A library is significantly less code than a service. For example, libraries don't need to authenticate their callers, don't have to serialize/deserialize inputs and return values, and don't need to process human input (like config files). As stated, any service would need the library's code anyway, but will also need much more code to do all of the above. This also means less code to write tests and documentation for. * A library is able to meet all of our known requirements, and is simple. I believe in the "keep it simple principle" - we should pick the simplest solution that meets the requirements. I would like to address the benefits of RATS outlined above, one by one here: * RATS can keep track of which tests have been run to prevent too many re-runs I believe that a library could do this locally for Bodhi and for Pagure too once Pagure gets more test integration, by using a local cache. Bodhi and Pagure will likely not gate on the same kinds of tests, and so they don't need a central authority to make sure they each aren't requesting the same test to be re-run. Furthermore, I think we don't need to do a perfect job of making sure tests aren't re-run at all, which is why I think having a library that caches recent re-run requests will be "good enough" under the "perfect is the enemy of good" mantra. * Calling from multiple languages This is a theoretical requirement at this point - we don't have a real use case for this at this time. Further, a library would make it easy to write a CLI which can be used if there ever does become a requirement for other languages. A REST API is not inherently easier to use than a CLI - in fact, I would make the case that it is harder to use. * Updating While it is true that a service can be updated in one place such that all callers get the update immediately, we do have a big ansible project that can just as easily deploy a library update out to all the machines that use it, so I would argue that we practically have the same benefit with a library. Thoughts? ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Re: [Rpm-ecosystem] Zchunk update
On Mon, 2018-04-16 at 09:00 -0400, Neal Gompa wrote: > On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter wrote: > > I've also added zchunk support to createrepo_c (see > > https://github.com/jdieter/createrepo_c), but I haven't yet created a > > pull request because I'm not sure if my current implementation is the > > best method. My current effort only zchunks primary.xml, filelists.xml > > and other.xml and doesn't change the sort order. > > > > Fedora COPR, Open Build Service, Mageia, and openSUSE also append > AppStream data to repodata to ship AppStream information. Is there a > way we can incorporate this into zck rpm-md? There's been an issue for > a while to support generating the AppStream metadata as part of the > createrepo_c run using the libappstream-builder library[1], which may > lend itself to doing this properly. Is it repomd.xml that actually gets changed or primary.xml / filelists.xml / other.xml? If it's repomd.xml, then it really shouldn't make any difference because I'm not currently zchunking it. As far as I can see, the only reason to zchunk it would be to have an embedded GPG signature once they're supported in zchunk. > > The one area of zchunk that still needs some API work is the download > > and chunk merge API, and I'm planning to clean that up as I add zchunk > > support to librepo. > > > > Some things I'd still like to add to zchunk: > > * A python API > > * GPG signatures in addition to (possibly replacing) overall data > >checksum > > I'd rather not lose checksums, but GPG signatures would definitely be > necessary, as openSUSE needs them, and we'd definitely like to have > them in Fedora[2], COPR[3], and Mageia[4]. Fair enough. Would we want zchunk to support multiple GPG signatures or is one enough? > > * An expiry field? (I'm obviously thinking about signed repodata here) > > Do we need an expiry field if we properly processed the key > revocation/expiration in librepo? My understanding is that current > hiccup with it is that we don't, and that the GPG keyring used in > librepo is independent of the RPM keyring (which it shouldn't be). Ah, that makes sense. Forget that idea then. Jonathan ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Meeting agenda: RATS vs. librat
Greetings! I'd like to have a discussion during this week's infrastructure meeting about whether we need the project that will re-run failed gating tests to be a service, or whether a library will do. There has been some lively debate about this in IRC, but it didn't seem like there was resolution and it would be helpful if we could make a decision together so we can proceed with implementation. Would this week's infra meeting be a good time to discuss this? Alternatively, we could discuss it on this mailing list if desired. ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Re: Planned MirrorManager changes
On Sat, Apr 14, 2018 at 04:28:37PM -0700, Kevin Fenzi wrote: > > I would like to change the setup of our mirror crawler and just wanted > > to mention my planned changes here before working on them. > > > > Currently we have two VMs which are crawling our mirrors. Each of the > > machine is responsible for one half of the active mirrors. The crawl > > starts every 12 hours on the first crawler and 6 hours later on the > > second crawler. So every 6 hours one crawler is accessing the database. > > > > Currently most of the crawling time is not spent crawling but updating > > the database about which host has which directory up to date. With a > > timeout of 4 hours per host we are hitting that timeout on some hosts > > regularly and most of the time the database access is the problem. > > > > What I would like to change is to crawl each category (Fedora Linux, > > Fedora Other, Fedora EPEL, Fedora Secondary Arches, Fedora Archive) > > separately and at different times and intervals. > > > > We would not hit the timeout as often as now as only the information for > > a single category has to be updated. We could scan 'Fedora Archive' only > > once per day or every second day. We can scan 'Fedora EPEL' much more > > often as it is usually really fast and get better data about the > > available mirrors. > > > > My goal would be to distribute the scanning in such a way to decrease > > the load on the database and to decrease the cases of mirror > > auto-deactivation due to slow database accesses. > > > > Let me know if you think that these planned changes are the wrong > > direction of if you have other ideas how to improve the mirror crawling. > > Sounds like all great ideas to me. ;) Thanks. > I wonder if we could also find some way to note which mirrors have > iso/image files, and could communicate this to the > download.fedoraproject.org redirect to only redirect people to mirrors > that have that specific file if they are pointing to an iso/qcow2, etc. This is one of the cases where MirrorManager, in theory, should almost handle it correctly. The important part of this sentence is 'in theory'. MirrorManager should know about the 3 most recent files in a directory and if we are crawling via rsync we even download the complete listing for a mirror. So besides the theory it would help to see a wrong redirect live to understand why it is happening. Adrian signature.asc Description: PGP signature ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Re: Planned MirrorManager changes
On Sat, Apr 14, 2018 at 12:37:24AM +, Stephen John Smoogen wrote: > On Fri, Apr 13, 2018 at 11:14 AM Adrian Reber wrote: > > > I would like to change the setup of our mirror crawler and just wanted > > to mention my planned changes here before working on them. > > > > Currently we have two VMs which are crawling our mirrors. Each of the > > machine is responsible for one half of the active mirrors. The crawl > > starts every 12 hours on the first crawler and 6 hours later on the > > second crawler. So every 6 hours one crawler is accessing the database. > > > > Currently most of the crawling time is not spent crawling but updating > > the database about which host has which directory up to date. With a > > timeout of 4 hours per host we are hitting that timeout on some hosts > > regularly and most of the time the database access is the problem. > > > > What I would like to change is to crawl each category (Fedora Linux, > > Fedora Other, Fedora EPEL, Fedora Secondary Arches, Fedora Archive) > > separately and at different times and intervals. > > > > We would not hit the timeout as often as now as only the information for > > a single category has to be updated. We could scan 'Fedora Archive' only > > once per day or every second day. We can scan 'Fedora EPEL' much more > > often as it is usually really fast and get better data about the > > available mirrors. > > > > My goal would be to distribute the scanning in such a way to decrease > > the load on the database and to decrease the cases of mirror > > auto-deactivation due to slow database accesses. > > > > Let me know if you think that these planned changes are the wrong > > direction of if you have other ideas how to improve the mirror crawling. > > These look like a good way to deal with the fact that we have a lot of data > and files and mirrors nd users get confused about how up to date they are. > Would more VM’s help spread this out also? From my point of view the main problem is the load MirrorManager creates on the database. Currently I do not think that more VMs would help the crawling. Someone once mentioned a dedicated database VM for MirrorManager. That is something which could make a difference, but first I would like to see if crawling per category can improve the situation. Adrian signature.asc Description: PGP signature ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Fwd: mirror failures
Can someone comment on this? I am not really sure if there are some issues on mirrors (not likely) or if this is some issue with Fedora Cloud when network is overloaded (more likely). Miroslav Přeposlaná zpráva Předmět: mirror failures Datum: Mon, 9 Apr 2018 09:38:04 +0200 Od: Michal Novotny Adresa pro odpověď: Community Projects Komu: Cool Other Package Repositories Hello, as of recent month or two I quite often encounter mirror sync failures on COPR builders during the build setup. Could you, please, confirm that this is really the case and link the build logs here if possible? I will then setup some extended monitoring if this proves to be true. Thank you clime ___ copr-devel mailing list -- copr-de...@lists.fedorahosted.org To unsubscribe send an email to copr-devel-le...@lists.fedorahosted.org ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Re: [Rpm-ecosystem] Zchunk update
On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter wrote: > It's been a number of weeks since my last update, so I thought I'd let > everyone know where things are at. > > I've spent most of these last few weeks reworking zchunk's API to make > it easier to use and more in line with what other compression tools > use, and I'm mostly happy with it now. Writing a simple zchunk file > can be done in a few lines of code, while reading one is also simple. > > I've also added zchunk support to createrepo_c (see > https://github.com/jdieter/createrepo_c), but I haven't yet created a > pull request because I'm not sure if my current implementation is the > best method. My current effort only zchunks primary.xml, filelists.xml > and other.xml and doesn't change the sort order. > Fedora COPR, Open Build Service, Mageia, and openSUSE also append AppStream data to repodata to ship AppStream information. Is there a way we can incorporate this into zck rpm-md? There's been an issue for a while to support generating the AppStream metadata as part of the createrepo_c run using the libappstream-builder library[1], which may lend itself to doing this properly. [1]: https://github.com/rpm-software-management/createrepo_c/issues/75 > The one area of zchunk that still needs some API work is the download > and chunk merge API, and I'm planning to clean that up as I add zchunk > support to librepo. > > Some things I'd still like to add to zchunk: > * A python API > * GPG signatures in addition to (possibly replacing) overall data >checksum I'd rather not lose checksums, but GPG signatures would definitely be necessary, as openSUSE needs them, and we'd definitely like to have them in Fedora[2], COPR[3], and Mageia[4]. [2]: https://pagure.io/releng/issue/133 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1373331 [4]: https://bugs.mageia.org/show_bug.cgi?id=19432 > * An expiry field? (I'm obviously thinking about signed repodata here) Do we need an expiry field if we properly processed the key revocation/expiration in librepo? My understanding is that current hiccup with it is that we don't, and that the GPG keyring used in librepo is independent of the RPM keyring (which it shouldn't be). -- 真実はいつも一つ!/ Always, there's only one truth! ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Proposed zchunk file format - V4
Here's version four with a swap from fixed-length integers to variable- length compressed integers which allow us to skip compression of the index (since the non-integer data is all uncompressable checksums). I've also added the uncompressed size of each chunk to the index to make it easier to figure out how much space to allocate for the uncompressed chunk. +-+-+-+-+-++=++ | ID| Checksum type (ci) | Header checksum | Compression type (ci ) | +-+-+-+-+-++=++ +=+===+=+ | Index size (ci) | Index | Compressed Dict | +=+===+=+ +===+===+ | Chunk | Chunk | ==> More chunks +===+===+ (ci) Compressed (unsigned) integer - An variable length little endian integer where the first seven bits of the number are stored in the first byte, followed by the next seven bits in the next byte, and so on. The top bit of all bytes except the final byte must be zero, and the top bit of the final byte must be one, indicating the end of the number. ID '\0ZCK1', identifies file as zchunk version 1 file Checksum type This is an 8-bit unsigned integer containing the type of checksum used to generate the header checksum and the total data checksum, but *not* the chunk checksums. Current values: 0 = SHA-1 1 = SHA-256 Header checksum This is the checksum of everything from the beginning of the file until the end of the index when the header checksum is all \0's. Compression type This is an integer containing the type of compression used to compress dict and chunks. Current values: 0 - Uncompressed 2 - zstd Index size This is an integer containing the size of the index. Index This is the index, which is described in the next section. Compressed Dict (optional) This is a custom dictionary used when compressing each chunk. Because each chunk is compressed completely separately from the others, the custom dictionary gives us much better overall compression. The custom dictionary is compressed without a custom dictionary (for obvious reasons). Chunk This is a chunk of data, compressed with the custom dictionary provided above. The index: +==+==+===+ | Chunk checksum type (ci) | Chunk count (ci) | Data checksum | +==+==+===+ +===+==+===+ | Dict checksum | Dict length (ci) | Uncompressed dict length (ci) | +===+==+===+ ++===+==+ | Chunk checksum | Chunk length (ci) | Uncompressed length (ci) | ... ++===+==+ Chunk checksum type This is an integer containing the type of checksum used to generate the chunk checksums. Current values: 0 = SHA-1 1 = SHA-256 Chunk count This is a count of the number of chunks in the zchunk file. Checksum of all data This is the checksum of everything after the index, including the compressed dict and all the compressed chunks. This checksum is generated using the overall checksum type, *not* the chunk checksum type. Dict checksum This is the checksum of the compressed dict, used to detect whether two dicts are identical. If there is no dict, the checksum must be all zeros. Dict length This is an integer containing the length of the dict. If there is no dict, this must be a zero. Uncompressed dict length This is an integer containing the length of the dict after it has been decompressed. If there is no dict, this must be a zero. Chunk checksum This is the checksum of the compressed chunk, used to detect whether any two chunks are identical. Chunk length This is an integer containing the length of the chunk. Uncompressed dict length This is an integer containing the length of the chunk after it has been decompressed. The index is designed to be able to be extracted from the file on the server and downloaded separately, to facilitate downloading only the parts of the file that are needed, but must then be re-embedded when assembling the file so the user only needs to keep one file. ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Zchunk update
It's been a number of weeks since my last update, so I thought I'd let everyone know where things are at. I've spent most of these last few weeks reworking zchunk's API to make it easier to use and more in line with what other compression tools use, and I'm mostly happy with it now. Writing a simple zchunk file can be done in a few lines of code, while reading one is also simple. I've also added zchunk support to createrepo_c (see https://github.com/jdieter/createrepo_c), but I haven't yet created a pull request because I'm not sure if my current implementation is the best method. My current effort only zchunks primary.xml, filelists.xml and other.xml and doesn't change the sort order. The one area of zchunk that still needs some API work is the download and chunk merge API, and I'm planning to clean that up as I add zchunk support to librepo. Some things I'd still like to add to zchunk: * A python API * GPG signatures in addition to (possibly replacing) overall data checksum * An expiry field? (I'm obviously thinking about signed repodata here) * Tests * More tests * Other arch testing (it's currently only tested on x86_64) I'd welcome any feedback or flames. Jonathan ___ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org