Re: [Pulp-dev] Duplicate nevra but not pkgId (suse repos)
Pavel, I meant to say, that pulp3 does not have such limitation as pulp2 had ( saving rpms on the filesystem with same nevra). The error is raised in pulp3 [0] when a repo version is created, because of the repo key[1], we cannot have 2 rpms with save NEVRA. We can enable that, if we decide to, by adding location_href to the repo_key, *but* this needs to be evaluated, it can have side effects and we should involve our stakeholders to weigh in. [0] https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570 [1] https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/models/package.py#L188 Regards, Ina Panova Senior Software Engineer| Pulp| Red Hat Inc. "Do not go where the path may lead, go instead where there is no path and leave a trail." On Wed, Mar 18, 2020 at 2:24 PM Pavel Picka wrote: > True in opensuse repository there are two possibilities 'src' and 'nosrc' > (this one should be legacy without source code), both are recognized by > createrepo_c as arch 'src'. > > To point the pulp2 code I mentioned I found here [0] (base rpm package > what I understood). > > The rise of error in pulp3 happening here [1] in pulpcore when adding > packages to repository version. > So as Ina mentioned it doesn't have to be an issue with packages itself > than the logic in sync. > > [0] > https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779 > [1] > https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570 > > On Wed, Mar 18, 2020 at 1:55 PM Ina Panova wrote: > >> Tanya and Pavel, >> in this issue it is explained why we cannot keep 2 packages with same >> NEVRA but different checksums within a repo >> https://pulp.plan.io/issues/494 >> >> Pulp2 had a limitation where it was not able to save on the filesystem 2 >> rpms with same filename, it lead to the primary.xml that could have pointed >> to the rpm that did not actually get saved. >> I believe in Pulp3 we could allow having rpm with same NEVRA if they have >> different location_href within a repo. >> >> >> Regards, >> >> Ina Panova >> Senior Software Engineer| Pulp| Red Hat Inc. >> >> "Do not go where the path may lead, >> go instead where there is no path and leave a trail." >> >> >> On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko < >> ttere...@redhat.com> wrote: >> >>> Hi Pavel, >>> >>> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka wrote: >>> Hello, would like to ask you how to proceed with issue with duplicate (but not really) packages. I am syncing suse repository (opensuse42 and SLE12) and get and duplicate error. But when checking the packages [0](from primary.xml) glibc and glibc they got same nevra but different checksum (and a few more as size..) so doesn't look like real duplicates. >>> Those are weird, the have the same nevra but see the location_href, one >>> is src and the other one is nosrc! :/ : >>> >>> >>> >>> It looks like something OpenSUSE specific. I'm not sure if it's a valid >>> way to create a repo with such metadata, we need to figure it out at some >>> point. >>> >>> I've checked Pulp2 and there is used nevra+sum for repository uniqueness. In pulp3 we use only nevra. >>> Why do you think that in pulp 2 we use NEVRA + checksum? have you tested >>> it? please point to the code. >>> I believe in Pulp 2 as well as in Pulp 3 we allow to have packages with >>> different checksums in Pulp storage. >>> I don't think we allow having the same packages with different checksums >>> in the same repo. >>> FWIW, in pulp 2 the most recently added package is chosen to stay in a >>> repo, no packages with duplicate NEVRA left after sync, see >>> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333 >>> >>> My suggestion is to extend repo_key_fields for rpm package as is in pulp2 with pkgId (checksum). As I don't think they are really duplicates and other software can rely on specific version of package. >>> >>> Unfortunately, I don't remember the main reason to remove duplicates >>> based on nevra. Was it because some tooling will complain, or was it just >>> to avoid duplicates at resync time? Does anyone know? >>> We should not change it unless we know for sure that it's needed + we >>> would need to have an agreement from all our stakeholders for that change. >>> >>> For now, I think we can move on and ensure that no duplicates are in a >>> repo version. To my understanding, the behaviour will be the same as in >>> pulp 2. >>> Feel free to share where you get duplicate error to see if it's a bug or >>> not. I wonder why duplicates are not removed automatically. Maybe because >>> the first version contains duplicates due to this bug >>> https://pulp.plan.io/issues/6217 ? >>> >>> Tanya >>> >>> What do you think? [0] > > glibc > src > >
Re: [Pulp-dev] Duplicate nevra but not pkgId (suse repos)
True in opensuse repository there are two possibilities 'src' and 'nosrc' (this one should be legacy without source code), both are recognized by createrepo_c as arch 'src'. To point the pulp2 code I mentioned I found here [0] (base rpm package what I understood). The rise of error in pulp3 happening here [1] in pulpcore when adding packages to repository version. So as Ina mentioned it doesn't have to be an issue with packages itself than the logic in sync. [0] https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779 [1] https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570 On Wed, Mar 18, 2020 at 1:55 PM Ina Panova wrote: > Tanya and Pavel, > in this issue it is explained why we cannot keep 2 packages with same > NEVRA but different checksums within a repo > https://pulp.plan.io/issues/494 > > Pulp2 had a limitation where it was not able to save on the filesystem 2 > rpms with same filename, it lead to the primary.xml that could have pointed > to the rpm that did not actually get saved. > I believe in Pulp3 we could allow having rpm with same NEVRA if they have > different location_href within a repo. > > > Regards, > > Ina Panova > Senior Software Engineer| Pulp| Red Hat Inc. > > "Do not go where the path may lead, > go instead where there is no path and leave a trail." > > > On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko > wrote: > >> Hi Pavel, >> >> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka wrote: >> >>> Hello, would like to ask you how to proceed with issue with duplicate >>> (but not really) packages. >>> >>> I am syncing suse repository (opensuse42 and SLE12) and get and >>> duplicate error. But when checking the packages [0](from primary.xml) glibc >>> and glibc they got same nevra but different checksum (and a few more as >>> size..) so doesn't look like real duplicates. >>> >> Those are weird, the have the same nevra but see the location_href, one >> is src and the other one is nosrc! :/ : >> >> >> >> It looks like something OpenSUSE specific. I'm not sure if it's a valid >> way to create a repo with such metadata, we need to figure it out at some >> point. >> >> >>> I've checked Pulp2 and there is used nevra+sum for repository >>> uniqueness. In pulp3 we use only nevra. >>> >> Why do you think that in pulp 2 we use NEVRA + checksum? have you tested >> it? please point to the code. >> I believe in Pulp 2 as well as in Pulp 3 we allow to have packages with >> different checksums in Pulp storage. >> I don't think we allow having the same packages with different checksums >> in the same repo. >> FWIW, in pulp 2 the most recently added package is chosen to stay in a >> repo, no packages with duplicate NEVRA left after sync, see >> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333 >> >> >>> >>> My suggestion is to extend repo_key_fields for rpm package as is in >>> pulp2 with pkgId (checksum). As I don't think they are really duplicates >>> and other software can rely on specific version of package. >>> >> >> Unfortunately, I don't remember the main reason to remove duplicates >> based on nevra. Was it because some tooling will complain, or was it just >> to avoid duplicates at resync time? Does anyone know? >> We should not change it unless we know for sure that it's needed + we >> would need to have an agreement from all our stakeholders for that change. >> >> For now, I think we can move on and ensure that no duplicates are in a >> repo version. To my understanding, the behaviour will be the same as in >> pulp 2. >> Feel free to share where you get duplicate error to see if it's a bug or >> not. I wonder why duplicates are not removed automatically. Maybe because >> the first version contains duplicates due to this bug >> https://pulp.plan.io/issues/6217 ? >> >> Tanya >> >> >>> >>> What do you think? >>> >>> >>> [0] >>> glibc src >>> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310 Standard Shared Libraries (from the GNU C Library) The GNU C Library provides the most important standard libraries used by nearly all programs: the standard C library, the standard math library, and the POSIX thread library. A system is not functional without these libraries. https://www.suse.com/ http://www.gnu.org/software/libc/libc.html LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and GPL-2.0+ SUSE LLC https://www.suse.com/; System/Libraries sheep16 glibc src >>> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068 Standard Shared Libraries (from the GNU C Library) The GNU C Library
Re: [Pulp-dev] Package in a different repo does not get added to package list on Module
This has always been a grey area: what if the user who has created RepoA cannot access content to the repoB and yet we are 'stealing' the content from repoB? Regards, Ina Panova Senior Software Engineer| Pulp| Red Hat Inc. "Do not go where the path may lead, go instead where there is no path and leave a trail." On Tue, Mar 17, 2020 at 7:41 PM Pavel Picka wrote: > Hi, > > started to work on #6295 [0] and by now at sync we look only for actual > (repository we are syncing) packages if they are modular and connect to > modulemd. > > To fix this issue we will need to check content from other repositories > (already synced) what can have a really huge impact on sync time in case of > big repositories. > > Do we want to get through all pulp content (RPM packages) when syncing new > repository with modulemd? Or idea can be to extend sync API call with new > argument to scan (all or specific) repositories. > > I think we would like to keep performance of sync so better to discuss > first. > > Thank you > > [0] https://pulp.plan.io/issues/6295 > > -- > Pavel Picka > Red Hat > ___ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev > ___ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev
Re: [Pulp-dev] Cherry pick labeling
It would be great if we could automate this part as well. As per responsibility, I think both reviewer and author should share it. Obviously if this is a new contributor, as Tanya mentioned, then reviewer should make sure the label is properly placed if needed. Regards, Ina Panova Senior Software Engineer| Pulp| Red Hat Inc. "Do not go where the path may lead, go instead where there is no path and leave a trail." On Wed, Mar 18, 2020 at 1:36 PM David Davis wrote: > I think it would be easy to automatically apply the cherry pick label with > Github Actions. Github Actions has all sorts of events that can trigger > workflows including opening a new PR[0]. This workflow could also > automatically move the redmine issue to POST and comment with the PR too. > > [0] > https://help.github.com/en/actions/reference/events-that-trigger-workflows#pull-request-event-pull_request > > David > > > On Wed, Mar 18, 2020 at 6:02 AM Tatiana Tereshchenko > wrote: > >> I believe it's the responsibility of both the author and the PR reviewer. >> >> If it's a one-time contribution from someone, then the PR reviewer is >> likely the one who is aware whether the cherry-pick should be done or not. >> However in the majority of cases, we have regular contributors and they >> are aware of the process. Depending on the fix, they might be in a better >> position to say whether it's worth cherry-picking, if the cherry-pick will >> be clean or not. >> >> Alternatively, can we automate it? When PR is open, look at the referred >> redmine issue and check its tracker, if it's an "issue", mark PR as the one >> to be cherry-picked, so reviewer can unset it if it's undesirable for >> whatever reason. >> >> Tanya >> >> On Tue, Mar 17, 2020 at 9:43 PM David Davis >> wrote: >> >>> Today we missed a change that could have maybe have gone out with the >>> 3.2 release. It stemmed from a lack of clarity around whose responsibility >>> it is to label PRs with the cherry pick label. I think the general >>> agreement is that it's ultimately the responsibility of the PR reviewer to >>> add this label to the PR. I'm interested to see if there are any other >>> thoughts or objections. Here is a PR I've opened as a proposal: >>> >>> https://github.com/pulp/pulpcore/pull/592 >>> >>> David >>> ___ >>> Pulp-dev mailing list >>> Pulp-dev@redhat.com >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> ___ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev > ___ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev
Re: [Pulp-dev] Duplicate nevra but not pkgId (suse repos)
Tanya and Pavel, in this issue it is explained why we cannot keep 2 packages with same NEVRA but different checksums within a repo https://pulp.plan.io/issues/494 Pulp2 had a limitation where it was not able to save on the filesystem 2 rpms with same filename, it lead to the primary.xml that could have pointed to the rpm that did not actually get saved. I believe in Pulp3 we could allow having rpm with same NEVRA if they have different location_href within a repo. Regards, Ina Panova Senior Software Engineer| Pulp| Red Hat Inc. "Do not go where the path may lead, go instead where there is no path and leave a trail." On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko wrote: > Hi Pavel, > > On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka wrote: > >> Hello, would like to ask you how to proceed with issue with duplicate >> (but not really) packages. >> >> I am syncing suse repository (opensuse42 and SLE12) and get and duplicate >> error. But when checking the packages [0](from primary.xml) glibc and glibc >> they got same nevra but different checksum (and a few more as size..) so >> doesn't look like real duplicates. >> > Those are weird, the have the same nevra but see the location_href, one is > src and the other one is nosrc! :/ : > > > > It looks like something OpenSUSE specific. I'm not sure if it's a valid > way to create a repo with such metadata, we need to figure it out at some > point. > > >> I've checked Pulp2 and there is used nevra+sum for repository uniqueness. >> In pulp3 we use only nevra. >> > Why do you think that in pulp 2 we use NEVRA + checksum? have you tested > it? please point to the code. > I believe in Pulp 2 as well as in Pulp 3 we allow to have packages with > different checksums in Pulp storage. > I don't think we allow having the same packages with different checksums > in the same repo. > FWIW, in pulp 2 the most recently added package is chosen to stay in a > repo, no packages with duplicate NEVRA left after sync, see > https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333 > > >> >> My suggestion is to extend repo_key_fields for rpm package as is in pulp2 >> with pkgId (checksum). As I don't think they are really duplicates and >> other software can rely on specific version of package. >> > > Unfortunately, I don't remember the main reason to remove duplicates based > on nevra. Was it because some tooling will complain, or was it just to > avoid duplicates at resync time? Does anyone know? > We should not change it unless we know for sure that it's needed + we > would need to have an agreement from all our stakeholders for that change. > > For now, I think we can move on and ensure that no duplicates are in a > repo version. To my understanding, the behaviour will be the same as in > pulp 2. > Feel free to share where you get duplicate error to see if it's a bug or > not. I wonder why duplicates are not removed automatically. Maybe because > the first version contains duplicates due to this bug > https://pulp.plan.io/issues/6217 ? > > Tanya > > >> >> What do you think? >> >> >> [0] >> >>> >>> glibc >>> src >>> >>> >> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310 >>> Standard Shared Libraries (from the GNU C Library) >>> The GNU C Library provides the most important standard >>> libraries used >>> by nearly all programs: the standard C library, the standard math >>> library, and the POSIX thread library. A system is not functional >>> without these libraries. >>> https://www.suse.com/ >>> http://www.gnu.org/software/libc/libc.html >>> >>> >>> >>> >>> LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and >>> GPL-2.0+ >>> SUSE LLC https://www.suse.com/; >>> System/Libraries >>> sheep16 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> glibc >>> src >>> >>> >> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068 >>> Standard Shared Libraries (from the GNU C Library) >>> The GNU C Library provides the most important standard >>> libraries used >>> by nearly all programs: the standard C library, the standard math >>> library, and the POSIX thread library. A system is not functional >>> without these libraries. >>> https://www.suse.com/ >>> http://www.gnu.org/software/libc/libc.html >>> >>> >>> >>> >>> LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and >>> GPL-2.0+ >>> SUSE LLC https://www.suse.com/; >>> System/Libraries >>> sheep02 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> -- >> Pavel Picka >> Red Hat >> ___ >> Pulp-dev mailing list >> Pulp-dev@redhat.com >> https://www.redhat.com/mailman/listinfo/pulp-dev >> > ___ > Pulp-dev mailing
Re: [Pulp-dev] Cherry pick labeling
I think it would be easy to automatically apply the cherry pick label with Github Actions. Github Actions has all sorts of events that can trigger workflows including opening a new PR[0]. This workflow could also automatically move the redmine issue to POST and comment with the PR too. [0] https://help.github.com/en/actions/reference/events-that-trigger-workflows#pull-request-event-pull_request David On Wed, Mar 18, 2020 at 6:02 AM Tatiana Tereshchenko wrote: > I believe it's the responsibility of both the author and the PR reviewer. > > If it's a one-time contribution from someone, then the PR reviewer is > likely the one who is aware whether the cherry-pick should be done or not. > However in the majority of cases, we have regular contributors and they > are aware of the process. Depending on the fix, they might be in a better > position to say whether it's worth cherry-picking, if the cherry-pick will > be clean or not. > > Alternatively, can we automate it? When PR is open, look at the referred > redmine issue and check its tracker, if it's an "issue", mark PR as the one > to be cherry-picked, so reviewer can unset it if it's undesirable for > whatever reason. > > Tanya > > On Tue, Mar 17, 2020 at 9:43 PM David Davis wrote: > >> Today we missed a change that could have maybe have gone out with the 3.2 >> release. It stemmed from a lack of clarity around whose responsibility it >> is to label PRs with the cherry pick label. I think the general agreement >> is that it's ultimately the responsibility of the PR reviewer to add this >> label to the PR. I'm interested to see if there are any other thoughts or >> objections. Here is a PR I've opened as a proposal: >> >> https://github.com/pulp/pulpcore/pull/592 >> >> David >> ___ >> Pulp-dev mailing list >> Pulp-dev@redhat.com >> https://www.redhat.com/mailman/listinfo/pulp-dev >> > ___ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev
Re: [Pulp-dev] Cherry pick labeling
I believe it's the responsibility of both the author and the PR reviewer. If it's a one-time contribution from someone, then the PR reviewer is likely the one who is aware whether the cherry-pick should be done or not. However in the majority of cases, we have regular contributors and they are aware of the process. Depending on the fix, they might be in a better position to say whether it's worth cherry-picking, if the cherry-pick will be clean or not. Alternatively, can we automate it? When PR is open, look at the referred redmine issue and check its tracker, if it's an "issue", mark PR as the one to be cherry-picked, so reviewer can unset it if it's undesirable for whatever reason. Tanya On Tue, Mar 17, 2020 at 9:43 PM David Davis wrote: > Today we missed a change that could have maybe have gone out with the 3.2 > release. It stemmed from a lack of clarity around whose responsibility it > is to label PRs with the cherry pick label. I think the general agreement > is that it's ultimately the responsibility of the PR reviewer to add this > label to the PR. I'm interested to see if there are any other thoughts or > objections. Here is a PR I've opened as a proposal: > > https://github.com/pulp/pulpcore/pull/592 > > David > ___ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev > ___ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev