On Wed, Apr 21, 2021 at 8:03 AM Tanya Tereshchenko <ttere...@redhat.com> wrote:
> Hey Grant, > [brilliant investigation happens] And this is exactly what I needed - thanks Tanya, that clears up All The Things for me. I'm going to amend the PR to use Pulp2's naming convention, and add some commentary in the collection-merge code so the next person won't have to go through this exercise. This was a great help, thanks so much! G > > Let's start with thoughts without modularity :) > I think the main reasons for the existing behavior are semantical and also > aim to preserve collections as they are. > The situation you describe (when I change only pkglist and nothing else at > all) usually means one of the following: > 1. A user wants to combine the same advisory which came from different > repos, e.g RHEL base and RHEL debuginfo. > 2. A user tries to fix an advisory in a bad way. This should never > rarely happen. If they change advisory, they need to update the `updated` > timestamp and or version, and no merge will be done. > > For situation #1, if it happens that collection names are the same (which > is often not the case in the wild), we are not modifying the existing one, > and you can clearly see that it was combined, it also preserves grouping of > packages in it which might be relevant, e.g. debuginfo vs others. > For situation #2, I agree it looks a bit weird but they should not do it > in the first place. > > Having said that, there is no problem functionality wise, one or two > collections will work fine if their names are different. > I personally would leave it as is. It gives me more confidence that we are > not modifying a collection package list but just adding a new one and > feeding it to createrepo_c as is. Plus if there is a problem, it's easier > to see that it was a merge. > > Now the modularity part :) > This might require some experimenting with dnf or just asking them how the > situation with multiple collections for the same modules is handled. > Here is how modular collection looks: > > Collection > - name_0 > - module NSVCA_0 > - list of packages > > If we look at 2 modular collections with the same name but different > modules, those can not be merged. > Current behaviour covers that case. If we choose to put all packages into > one collection at the merge time, we need to check the modularity aspect of > the collection. > > If we look at 2 modular collections with the same name and same modules, > we technically can merge them. > And here is the unknown part for me, if we leave 2 collections with the > same modules and make names different, will dnf handle it correctly, or > does it expect only one collection per module. > I'd think that dnf doesn't care for module uniqueness if the collection > names are different but it's worth checking. > > Thanks for sorting it all out, > Tanya > > P.S. +1 to give a better name to the unnamed collections, fwiw, in pulp 2 > it was something like "collection_0", collection_1", etc. > > > On Tue, Apr 20, 2021 at 3:03 PM Grant Gainey <ggai...@redhat.com> wrote: > >> Hey gang, >> >> In the process of fixing issue 7282 <https://pulp.plan.io/issues/7282>, >> I have encountered a behavior from advisory-merge that appears to be the >> deliberately-desired behavior - and I'm not sure it's really what we want >> to do? I need some thoughts on this. Figured I'd start in email, because >> it's advisory-merge, and complicated, and trying to follow a verbal >> description is confusing. More confusing. >> >> The PR with the proposed fix is >> https://github.com/pulp/pulp_rpm/pull/1973 >> >> The scenario is as follows: >> >> 1) Upload an advisory that has a package-list with one RPM, say >> 'bear', to a repository. >> 2) Upload an *identical* advisory, except for having a different package >> list with one RPM, say 'camel', to the same repository. >> 3) The resulting advisory-conflict-resolution falls into the "date and >> version identical, pkg-list disjoint" case, which calls for an advisory >> merge here: >> >> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/advisory.py#L222 >> >> 4) advisory_merge() finds two 'collections' - one for each >> pkglist.packages in each advisory, neither of which is named. >> 5) advisory_merge() results in an advisory with two collections. One has >> no name, and is a package-list containing 'bear'. The other is named >> "None_0"[0], and is a package-list containing "camel". >> >> You can see where this decision is made here: >> >> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/advisory.py#L273-L284 >> >> That result was surprising to me. My assumption going in was, if we were >> merging two advisories, we would merge their same-named collections into >> one collection with elements from both. But we're very carefully choosing >> to not-do that. >> >> So, my question here is, can someone explain to me the rationale for the >> current behavior? I'm sure it has something to do with modularity, because >> half of the weird things I run into in RPM-land are because of modularity >> :) But I'd like to understand why we're doing what we're doing, so I can >> add some commentary in the test or in the merge_advisories() code itself. >> >> If this turns out to be undesirable emergent behavior, then I can address >> it as part of this PR. >> >> The test I added for all this fun is here: >> >> https://github.com/pulp/pulp_rpm/pull/1973/files#diff-781abba15eb7e61b9ae644ae112bf1292dfd7eb78e7162b74830d4ddb43e1868R274 >> >> Thoughts very much appreciated! >> G >> >> [0] "None_0" is Rude. Even if this is the right behavior, we prob could >> stand to notice 'unnamed' collections and do something a little less >> painful... >> -- >> Grant Gainey >> Principal Software Engineer, Red Hat System Management Engineering >> > -- Grant Gainey Principal Software Engineer, Red Hat System Management Engineering
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://listman.redhat.com/mailman/listinfo/pulp-dev