I want to retell Simon's proposal to have "Content defines a 'repo_key' similar to a unit_key. This key must be unique within a repo version (and not globally like the unit_key."
We could adopt his proposal to have the repo_key tuple defined on Content in pulpcore. If we left the add/remove APIs in core and adopt for both sync and add/remove a "keep newest to associate" functionality described earlier in the thread. This "keep newest to associate" code would be used by sync in the form of a core stage that is a generalized version of the RemoveDuplicates stage. This would become part of the default pipeline for all users of Stages API. I think this would be better than plugin writers implementing it over and over and also less effort for plugin wrtiers. This design would meet the current needs of pulp_cookbook, pulp_file, and pulp_rpm which are the only 3 places I know we have this problem so far, but I believe more content types are susceptible to this. What do you think we should do? Thanks! Brian On Thu, Jun 27, 2019 at 4:03 AM Tatiana Tereshchenko <ttere...@redhat.com> wrote: > Sure, the code can be de-duplicated. > My main worry is that it's a responsibility of a plugin writer not to > forget to ensure uniqueness constraints within a repo version for every > workflow (sync, copy, anything else) where a repo version is created. > Every time before RepositoryVersion.create() is called, there should be a > check that there is no colliding content in a repo version. > It would be much more reliable and friendly, in my opinion, if plugin > writer could define rules/callbacks/whatever for each content and it would > be applied to any repository creation. > At the same time this eliminates the flexibility to define different logic > for content collision for different workflows, however I'm not sure if such > a use case exists or is desired. > > Tanya > > > On Wed, Jun 26, 2019 at 6:49 PM Austin Macdonald <amacd...@redhat.com> > wrote: > >> @Tanya Tereshchenko <ttere...@redhat.com> >> >>> Do I understand correctly that it doesn't cover the sync case and it's >>> only about explicit repo version creation? >>> >> >> I don't mean that add/remove could not share code with remove duplicate >> stage. I wanted to point out that we have a problem here (how to remove >> duplicates) that has similar patterns to other problems with add remove >> (recursive, copy, deciding which content to keep with a collision, etc.) I >> don't doubt that pulpcore could help solve these problems, but I think that >> as we approach our GA, we should consider solving this problem (for now) by >> getting out of the way of plugin writers rather than by implementing code >> that is supposed to work for all plugins. I suspect that plenty of the >> plugins will be implementing their own add/remove anyway. >> >> On Tue, Jun 25, 2019 at 12:56 PM David Davis <davidda...@redhat.com> >> wrote: >> >>> I don't think this solution would work in the case of creating a new >>> repository version. Suppose for example you had two content units that >>> collide, one in a repo version and one older unit that a user explicitly >>> wants to add to the repo version. If the latter one is older, then what >>> would happen? >>> >>> David >>> >>> >>> On Tue, Jun 25, 2019 at 12:48 PM Brian Bouterse <bbout...@redhat.com> >>> wrote: >>> >>>> Having a way for units to express their uniqueness per repo sounds good >>>> because then more areas of Pulp's code could answer the question: "will I >>>> have a duplicate if I add content X to repo_version Y". >>>> >>>> Let's assume we know that situation is about to occur during sync for >>>> example, what do we do about it? In the errata case we know the "new" one >>>> should replace the existing one. Maybe we start to 'order' the units with >>>> colliding repo keys and keep the newest one always? Would this work for >>>> pulp_cookbook and pulp_rpm? Would it generalize? Is this what you imagined? >>>> >>>> On Tue, Jun 25, 2019 at 5:30 AM Tatiana Tereshchenko < >>>> ttere...@redhat.com> wrote: >>>> >>>>> Do I understand correctly that it doesn't cover the sync case and it's >>>>> only about explicit repo version creation? >>>>> So the suggestion is to implement the same logic twice: for sync case >>>>> - RemoveDuplicates stage and/or maybe some custom stage (e.g. to disallow >>>>> overlapping paths), and for direct repo version creation - your proposal. >>>>> >>>>> >>>>> >>>>> On Mon, Jun 24, 2019 at 3:13 PM Austin Macdonald <amacd...@redhat.com> >>>>> wrote: >>>>> >>>>>> I have a design in mind for solving this problem: >>>>>> >>>>>> 1. Remove POST to RepositoryVersion (no general add/remove endpoint). >>>>>> 2. Add an endpoint to kick off an add/remove task, namespaced by >>>>>> plugin. ie `POST pulp/api/v3/docker/add-remove/` >>>>>> This view can be provided to all plugins by the plugin template, >>>>>> and will be based on the current RepositoryVersionCreate: >>>>>> >>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/viewsets/repository.py#L221-L258 >>>>>> Note: the main purpose of this view is to kick off the general >>>>>> add/remove task, which will be unchanged: >>>>>> >>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/tasks/repository.py#L70 >>>>>> 3. Add an add/remove serializer to the plugin API. >>>>>> 3. Plugins needing further customization can provide their own task >>>>>> and subclassed serializer. >>>>>> >>>>>> This gives the plugin writer full control over the endpoint >>>>>> (customizable arguments and validation), and full control over the flow >>>>>> (extra logic, depsolving, enforced uniqueness). It only uses the existing >>>>>> patterns (and existing required knowledge), but requires no work (other >>>>>> than using the template) for the simple case. >>>>>> >>>>>> On Mon, Jun 3, 2019 at 2:56 PM Simon Baatz <gmbno...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> On Mon, Jun 03, 2019 at 09:11:07AM -0400, David Davis wrote: >>>>>>> > @Simon I like the idea behind the repo_key solution you came up >>>>>>> with. >>>>>>> > Can you be more specific around cases you think that it couldn't >>>>>>> > handle? I imagine that plugin writers could use properties or >>>>>>> > denormailzation (ie additional database columns) to solve cases >>>>>>> where >>>>>>> > they need uniqueness across data that isn't in the database. In >>>>>>> a worst >>>>>>> > case scenario, they can't use the pulpcore solution and just >>>>>>> have to >>>>>>> > roll their own. >>>>>>> >>>>>>> >>>>>>> What I wrote probably sounded too pessimistic. You are right, in >>>>>>> most cases that should be doable. >>>>>>> >>>>>>> I agree that we could have a simple default solution that just >>>>>>> requires to specify a couple of field names in the easiest case. As >>>>>>> you >>>>>>> say, it should be possible use custom logic in a plugin if required. >>>>>>> >>>>>>> Here is the case I was thinking of that it can't handle: >>>>>>> >>>>>>> In pulp_file, a uniqueness constraint on "relative_path" would allow >>>>>>> content units "a" and "a/b" to be in a repo version. >>>>>>> >>>>>>> However, we may want file repos to be representable on an actual file >>>>>>> system (e.g. when exporting them as tar files). For the repo above, >>>>>>> this does not work, as "a" can't be a file and a directory at the >>>>>>> same time on a standard Unix file system. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pulp-dev mailing list >>>>>>> Pulp-dev@redhat.com >>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> Pulp-dev@redhat.com >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>> _______________________________________________ >>>>> Pulp-dev mailing list >>>>> Pulp-dev@redhat.com >>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>> >>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> Pulp-dev@redhat.com >>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> Pulp-dev@redhat.com >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev