Re: [Pulp-dev] Duplicate nevra but not pkgId (suse repos)

2020-03-18 Thread Ina Panova
Pavel,
I meant to say, that pulp3 does not have such limitation as pulp2 had (
saving rpms on the filesystem with same nevra).
The error is raised in pulp3 [0] when a repo version is created, because of
the repo key[1], we cannot have 2 rpms with save NEVRA.

We can enable that, if we decide to, by adding location_href to the
repo_key, *but* this needs to be evaluated, it can have side effects and we
should involve our stakeholders to weigh in.

[0]
https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
[1]
https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/models/package.py#L188


Regards,

Ina Panova
Senior Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Wed, Mar 18, 2020 at 2:24 PM Pavel Picka  wrote:

> True in opensuse repository there are two possibilities 'src' and 'nosrc'
> (this one should be legacy without source code), both are recognized by
> createrepo_c as arch 'src'.
>
> To point the pulp2 code I mentioned I found here [0] (base rpm package
> what I understood).
>
> The rise of error in pulp3 happening here [1] in pulpcore when adding
> packages to repository version.
> So as Ina mentioned it doesn't have to be an issue with packages itself
> than the logic in sync.
>
> [0]
> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779
> [1]
> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
>
> On Wed, Mar 18, 2020 at 1:55 PM Ina Panova  wrote:
>
>> Tanya and Pavel,
>> in this issue it is explained why we cannot keep 2 packages with same
>> NEVRA but different checksums within a repo
>> https://pulp.plan.io/issues/494
>>
>> Pulp2 had a limitation where it was not able to save on the filesystem 2
>> rpms with same filename, it lead to the primary.xml that could have pointed
>> to the rpm that did not actually get saved.
>> I believe in Pulp3 we could allow having rpm with same NEVRA if they have
>> different location_href within a repo.
>>
>> 
>> Regards,
>>
>> Ina Panova
>> Senior Software Engineer| Pulp| Red Hat Inc.
>>
>> "Do not go where the path may lead,
>>  go instead where there is no path and leave a trail."
>>
>>
>> On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko <
>> ttere...@redhat.com> wrote:
>>
>>> Hi Pavel,
>>>
>>> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka  wrote:
>>>
 Hello, would like to ask you how to proceed with issue with duplicate
 (but not really) packages.

 I am syncing suse repository (opensuse42 and SLE12) and get and
 duplicate error. But when checking the packages [0](from primary.xml) glibc
 and glibc they got same nevra but different checksum (and a few more as
 size..) so doesn't look like real duplicates.

>>> Those are weird, the have the same nevra but see the location_href, one
>>> is src and the other one is nosrc! :/ :
>>> 
>>> 
>>>
>>> It looks like something OpenSUSE specific. I'm not sure if it's a valid
>>> way to create a repo with such metadata, we need to figure it out at some
>>> point.
>>>
>>>
 I've checked Pulp2 and there is used nevra+sum for repository
 uniqueness. In pulp3 we use only nevra.

>>> Why do you think that in pulp 2 we use NEVRA + checksum? have you tested
>>> it?  please point to the code.
>>> I believe in Pulp 2 as well as in Pulp 3 we allow to have packages with
>>> different checksums in Pulp storage.
>>> I don't think we allow having the same packages with different checksums
>>> in the same repo.
>>> FWIW, in pulp 2 the most recently added package is chosen to stay in a
>>> repo, no packages with duplicate NEVRA left after sync, see
>>> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333
>>>
>>>

 My suggestion is to extend repo_key_fields for rpm package as is in
 pulp2 with pkgId (checksum). As I don't think they are really duplicates
 and other software can rely on specific version of package.

>>>
>>> Unfortunately, I don't remember the main reason to remove duplicates
>>> based on nevra. Was it because some tooling will complain, or was it just
>>> to avoid duplicates at resync time? Does anyone know?
>>> We should not change it unless we know for sure that it's needed + we
>>> would need to have an agreement from all our stakeholders for that change.
>>>
>>> For now, I think we can move on and ensure that no duplicates are in a
>>> repo version. To my understanding, the behaviour will be the same as in
>>> pulp 2.
>>> Feel free to share where you get duplicate error to see if it's a bug or
>>> not. I wonder why duplicates are not removed automatically. Maybe because
>>> the first version contains duplicates due to this bug
>>> https://pulp.plan.io/issues/6217 ?
>>>
>>> Tanya
>>>
>>>

 What do you think?


 [0]

> 
>   glibc
>   src
>   
>    

Re: [Pulp-dev] Duplicate nevra but not pkgId (suse repos)

2020-03-18 Thread Pavel Picka
True in opensuse repository there are two possibilities 'src' and 'nosrc'
(this one should be legacy without source code), both are recognized by
createrepo_c as arch 'src'.

To point the pulp2 code I mentioned I found here [0] (base rpm package what
I understood).

The rise of error in pulp3 happening here [1] in pulpcore when adding
packages to repository version.
So as Ina mentioned it doesn't have to be an issue with packages itself
than the logic in sync.

[0]
https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779
[1]
https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570

On Wed, Mar 18, 2020 at 1:55 PM Ina Panova  wrote:

> Tanya and Pavel,
> in this issue it is explained why we cannot keep 2 packages with same
> NEVRA but different checksums within a repo
> https://pulp.plan.io/issues/494
>
> Pulp2 had a limitation where it was not able to save on the filesystem 2
> rpms with same filename, it lead to the primary.xml that could have pointed
> to the rpm that did not actually get saved.
> I believe in Pulp3 we could allow having rpm with same NEVRA if they have
> different location_href within a repo.
>
> 
> Regards,
>
> Ina Panova
> Senior Software Engineer| Pulp| Red Hat Inc.
>
> "Do not go where the path may lead,
>  go instead where there is no path and leave a trail."
>
>
> On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko 
> wrote:
>
>> Hi Pavel,
>>
>> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka  wrote:
>>
>>> Hello, would like to ask you how to proceed with issue with duplicate
>>> (but not really) packages.
>>>
>>> I am syncing suse repository (opensuse42 and SLE12) and get and
>>> duplicate error. But when checking the packages [0](from primary.xml) glibc
>>> and glibc they got same nevra but different checksum (and a few more as
>>> size..) so doesn't look like real duplicates.
>>>
>> Those are weird, the have the same nevra but see the location_href, one
>> is src and the other one is nosrc! :/ :
>> 
>> 
>>
>> It looks like something OpenSUSE specific. I'm not sure if it's a valid
>> way to create a repo with such metadata, we need to figure it out at some
>> point.
>>
>>
>>> I've checked Pulp2 and there is used nevra+sum for repository
>>> uniqueness. In pulp3 we use only nevra.
>>>
>> Why do you think that in pulp 2 we use NEVRA + checksum? have you tested
>> it?  please point to the code.
>> I believe in Pulp 2 as well as in Pulp 3 we allow to have packages with
>> different checksums in Pulp storage.
>> I don't think we allow having the same packages with different checksums
>> in the same repo.
>> FWIW, in pulp 2 the most recently added package is chosen to stay in a
>> repo, no packages with duplicate NEVRA left after sync, see
>> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333
>>
>>
>>>
>>> My suggestion is to extend repo_key_fields for rpm package as is in
>>> pulp2 with pkgId (checksum). As I don't think they are really duplicates
>>> and other software can rely on specific version of package.
>>>
>>
>> Unfortunately, I don't remember the main reason to remove duplicates
>> based on nevra. Was it because some tooling will complain, or was it just
>> to avoid duplicates at resync time? Does anyone know?
>> We should not change it unless we know for sure that it's needed + we
>> would need to have an agreement from all our stakeholders for that change.
>>
>> For now, I think we can move on and ensure that no duplicates are in a
>> repo version. To my understanding, the behaviour will be the same as in
>> pulp 2.
>> Feel free to share where you get duplicate error to see if it's a bug or
>> not. I wonder why duplicates are not removed automatically. Maybe because
>> the first version contains duplicates due to this bug
>> https://pulp.plan.io/issues/6217 ?
>>
>> Tanya
>>
>>
>>>
>>> What do you think?
>>>
>>>
>>> [0]
>>>
 
   glibc
   src
   
   >>> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310
   Standard Shared Libraries (from the GNU C Library)
   The GNU C Library provides the most important standard
 libraries used
 by nearly all programs: the standard C library, the standard math
 library, and the POSIX thread library. A system is not functional
 without these libraries.
   https://www.suse.com/
   http://www.gnu.org/software/libc/libc.html
   
   
 
   
 LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and
 GPL-2.0+
 SUSE LLC https://www.suse.com/;
 System/Libraries
 sheep16
 
 
 
   
   
   
   
   
   
 
   
 

 
   glibc
   src
   
   >>> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068
   Standard Shared Libraries (from the GNU C Library)
   The GNU C Library 

Re: [Pulp-dev] Package in a different repo does not get added to package list on Module

2020-03-18 Thread Ina Panova
This has always been a grey area:

what if the user who has created RepoA cannot access content to the repoB
and yet we are 'stealing' the content from repoB?


Regards,

Ina Panova
Senior Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Tue, Mar 17, 2020 at 7:41 PM Pavel Picka  wrote:

> Hi,
>
> started to work on #6295 [0] and by now at sync we look only for actual
> (repository we are syncing) packages if they are modular and connect to
> modulemd.
>
> To fix this issue we will need to check content from other repositories
> (already synced) what can have a really huge impact on sync time in case of
> big repositories.
>
> Do we want to get through all pulp content (RPM packages) when syncing new
> repository with modulemd? Or idea can be to extend sync API call with new
> argument to scan (all or specific) repositories.
>
> I think we would like to keep performance of sync so better to discuss
> first.
>
> Thank you
>
> [0] https://pulp.plan.io/issues/6295
>
> --
> Pavel Picka
> Red Hat
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Cherry pick labeling

2020-03-18 Thread Ina Panova
It would be great if we could automate this part as well.
As per responsibility, I think both reviewer and author should share it.
Obviously if this is a new contributor, as Tanya mentioned, then reviewer
should make sure the label is properly placed if needed.



Regards,

Ina Panova
Senior Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Wed, Mar 18, 2020 at 1:36 PM David Davis  wrote:

> I think it would be easy to automatically apply the cherry pick label with
> Github Actions. Github Actions has all sorts of events that can trigger
> workflows including opening a new PR[0]. This workflow could also
> automatically move the redmine issue to POST and comment with the PR too.
>
> [0]
> https://help.github.com/en/actions/reference/events-that-trigger-workflows#pull-request-event-pull_request
>
> David
>
>
> On Wed, Mar 18, 2020 at 6:02 AM Tatiana Tereshchenko 
> wrote:
>
>> I believe it's the responsibility of both the author and the PR reviewer.
>>
>> If it's a one-time contribution from someone, then the PR reviewer is
>> likely the one who is aware whether the cherry-pick should be done or not.
>> However in the majority of cases, we have regular contributors and they
>> are aware of the process. Depending on the fix, they might be in a better
>> position to say whether it's worth cherry-picking, if the cherry-pick will
>> be clean or not.
>>
>> Alternatively, can we automate it? When PR is open, look at the referred
>> redmine issue and check its tracker, if it's an "issue", mark PR as the one
>> to be cherry-picked, so reviewer can unset it if it's undesirable for
>> whatever reason.
>>
>> Tanya
>>
>> On Tue, Mar 17, 2020 at 9:43 PM David Davis 
>> wrote:
>>
>>> Today we missed a change that could have maybe have gone out with the
>>> 3.2 release. It stemmed from a lack of clarity around whose responsibility
>>> it is to label PRs with the cherry pick label. I think the general
>>> agreement is that it's ultimately the responsibility of the PR reviewer to
>>> add this label to the PR. I'm interested to see if there are any other
>>> thoughts or objections. Here is a PR I've opened as a proposal:
>>>
>>> https://github.com/pulp/pulpcore/pull/592
>>>
>>> David
>>> ___
>>> Pulp-dev mailing list
>>> Pulp-dev@redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Duplicate nevra but not pkgId (suse repos)

2020-03-18 Thread Ina Panova
Tanya and Pavel,
in this issue it is explained why we cannot keep 2 packages with same NEVRA
but different checksums within a repo https://pulp.plan.io/issues/494

Pulp2 had a limitation where it was not able to save on the filesystem 2
rpms with same filename, it lead to the primary.xml that could have pointed
to the rpm that did not actually get saved.
I believe in Pulp3 we could allow having rpm with same NEVRA if they have
different location_href within a repo.


Regards,

Ina Panova
Senior Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko 
wrote:

> Hi Pavel,
>
> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka  wrote:
>
>> Hello, would like to ask you how to proceed with issue with duplicate
>> (but not really) packages.
>>
>> I am syncing suse repository (opensuse42 and SLE12) and get and duplicate
>> error. But when checking the packages [0](from primary.xml) glibc and glibc
>> they got same nevra but different checksum (and a few more as size..) so
>> doesn't look like real duplicates.
>>
> Those are weird, the have the same nevra but see the location_href, one is
> src and the other one is nosrc! :/ :
> 
> 
>
> It looks like something OpenSUSE specific. I'm not sure if it's a valid
> way to create a repo with such metadata, we need to figure it out at some
> point.
>
>
>> I've checked Pulp2 and there is used nevra+sum for repository uniqueness.
>> In pulp3 we use only nevra.
>>
> Why do you think that in pulp 2 we use NEVRA + checksum? have you tested
> it?  please point to the code.
> I believe in Pulp 2 as well as in Pulp 3 we allow to have packages with
> different checksums in Pulp storage.
> I don't think we allow having the same packages with different checksums
> in the same repo.
> FWIW, in pulp 2 the most recently added package is chosen to stay in a
> repo, no packages with duplicate NEVRA left after sync, see
> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333
>
>
>>
>> My suggestion is to extend repo_key_fields for rpm package as is in pulp2
>> with pkgId (checksum). As I don't think they are really duplicates and
>> other software can rely on specific version of package.
>>
>
> Unfortunately, I don't remember the main reason to remove duplicates based
> on nevra. Was it because some tooling will complain, or was it just to
> avoid duplicates at resync time? Does anyone know?
> We should not change it unless we know for sure that it's needed + we
> would need to have an agreement from all our stakeholders for that change.
>
> For now, I think we can move on and ensure that no duplicates are in a
> repo version. To my understanding, the behaviour will be the same as in
> pulp 2.
> Feel free to share where you get duplicate error to see if it's a bug or
> not. I wonder why duplicates are not removed automatically. Maybe because
> the first version contains duplicates due to this bug
> https://pulp.plan.io/issues/6217 ?
>
> Tanya
>
>
>>
>> What do you think?
>>
>>
>> [0]
>>
>>> 
>>>   glibc
>>>   src
>>>   
>>>   >> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310
>>>   Standard Shared Libraries (from the GNU C Library)
>>>   The GNU C Library provides the most important standard
>>> libraries used
>>> by nearly all programs: the standard C library, the standard math
>>> library, and the POSIX thread library. A system is not functional
>>> without these libraries.
>>>   https://www.suse.com/
>>>   http://www.gnu.org/software/libc/libc.html
>>>   
>>>   
>>> 
>>>   
>>> LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and
>>> GPL-2.0+
>>> SUSE LLC https://www.suse.com/;
>>> System/Libraries
>>> sheep16
>>> 
>>> 
>>> 
>>>   
>>>   
>>>   
>>>   
>>>   
>>>   
>>> 
>>>   
>>> 
>>>
>>> 
>>>   glibc
>>>   src
>>>   
>>>   >> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068
>>>   Standard Shared Libraries (from the GNU C Library)
>>>   The GNU C Library provides the most important standard
>>> libraries used
>>> by nearly all programs: the standard C library, the standard math
>>> library, and the POSIX thread library. A system is not functional
>>> without these libraries.
>>>   https://www.suse.com/
>>>   http://www.gnu.org/software/libc/libc.html
>>>   
>>>   
>>> 
>>>   
>>> LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and
>>> GPL-2.0+
>>> SUSE LLC https://www.suse.com/;
>>> System/Libraries
>>> sheep02
>>> 
>>> 
>>> 
>>>   
>>>   
>>>   
>>>   
>>>   
>>>   
>>> 
>>>   
>>> 
>>
>>
>> --
>> Pavel Picka
>> Red Hat
>> ___
>> Pulp-dev mailing list
>> Pulp-dev@redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
> ___
> Pulp-dev mailing 

Re: [Pulp-dev] Cherry pick labeling

2020-03-18 Thread David Davis
I think it would be easy to automatically apply the cherry pick label with
Github Actions. Github Actions has all sorts of events that can trigger
workflows including opening a new PR[0]. This workflow could also
automatically move the redmine issue to POST and comment with the PR too.

[0]
https://help.github.com/en/actions/reference/events-that-trigger-workflows#pull-request-event-pull_request

David


On Wed, Mar 18, 2020 at 6:02 AM Tatiana Tereshchenko 
wrote:

> I believe it's the responsibility of both the author and the PR reviewer.
>
> If it's a one-time contribution from someone, then the PR reviewer is
> likely the one who is aware whether the cherry-pick should be done or not.
> However in the majority of cases, we have regular contributors and they
> are aware of the process. Depending on the fix, they might be in a better
> position to say whether it's worth cherry-picking, if the cherry-pick will
> be clean or not.
>
> Alternatively, can we automate it? When PR is open, look at the referred
> redmine issue and check its tracker, if it's an "issue", mark PR as the one
> to be cherry-picked, so reviewer can unset it if it's undesirable for
> whatever reason.
>
> Tanya
>
> On Tue, Mar 17, 2020 at 9:43 PM David Davis  wrote:
>
>> Today we missed a change that could have maybe have gone out with the 3.2
>> release. It stemmed from a lack of clarity around whose responsibility it
>> is to label PRs with the cherry pick label. I think the general agreement
>> is that it's ultimately the responsibility of the PR reviewer to add this
>> label to the PR. I'm interested to see if there are any other thoughts or
>> objections. Here is a PR I've opened as a proposal:
>>
>> https://github.com/pulp/pulpcore/pull/592
>>
>> David
>> ___
>> Pulp-dev mailing list
>> Pulp-dev@redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Cherry pick labeling

2020-03-18 Thread Tatiana Tereshchenko
I believe it's the responsibility of both the author and the PR reviewer.

If it's a one-time contribution from someone, then the PR reviewer is
likely the one who is aware whether the cherry-pick should be done or not.
However in the majority of cases, we have regular contributors and they are
aware of the process. Depending on the fix, they might be in a better
position to say whether it's worth cherry-picking, if the cherry-pick will
be clean or not.

Alternatively, can we automate it? When PR is open, look at the referred
redmine issue and check its tracker, if it's an "issue", mark PR as the one
to be cherry-picked, so reviewer can unset it if it's undesirable for
whatever reason.

Tanya

On Tue, Mar 17, 2020 at 9:43 PM David Davis  wrote:

> Today we missed a change that could have maybe have gone out with the 3.2
> release. It stemmed from a lack of clarity around whose responsibility it
> is to label PRs with the cherry pick label. I think the general agreement
> is that it's ultimately the responsibility of the PR reviewer to add this
> label to the PR. I'm interested to see if there are any other thoughts or
> objections. Here is a PR I've opened as a proposal:
>
> https://github.com/pulp/pulpcore/pull/592
>
> David
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev