Re: [Pulp-dev] uniqueness constraints within a repository version

2019-07-30 Thread Tatiana Tereshchenko
After some offline discussion with several Pulp devs, we decided to dedicate this thread to one problem - duplicates (and move the other problem - filtering/validation - to a different thread). The current proposal is to have a repo_key on a content model (thanks, Simon) and ensure its uniqueness

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-07-24 Thread Brian Bouterse
On Mon, Jul 22, 2019 at 4:47 AM Tatiana Tereshchenko wrote: > > > On Sun, Jul 21, 2019 at 3:00 PM Brian Bouterse > wrote: > >> >> >> On Sun, Jul 21, 2019 at 6:23 AM Tatiana Tereshchenko >> wrote: >> >>> +1 to the idea of a repo_key. >>> >>> Should we also add the ability to apply custom

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-07-21 Thread Brian Bouterse
On Sun, Jul 21, 2019 at 6:23 AM Tatiana Tereshchenko wrote: > +1 to the idea of a repo_key. > > Should we also add the ability to apply custom validation of the content > being added? > Similar to a repo_key, Content model can optionally provide an additional > validator. > Use cases: > - for

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-07-21 Thread Tatiana Tereshchenko
+1 to the idea of a repo_key. Should we also add the ability to apply custom validation of the content being added? Similar to a repo_key, Content model can optionally provide an additional validator. Use cases: - for pulp_file to avoid relative path overlap - e.g. 'a/b' and 'a' - for pulp_rpm

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-07-08 Thread Brian Bouterse
I want to retell Simon's proposal to have "Content defines a 'repo_key' similar to a unit_key. This key must be unique within a repo version (and not globally like the unit_key." We could adopt his proposal to have the repo_key tuple defined on Content in pulpcore. If we left the add/remove APIs

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-06-27 Thread Tatiana Tereshchenko
Sure, the code can be de-duplicated. My main worry is that it's a responsibility of a plugin writer not to forget to ensure uniqueness constraints within a repo version for every workflow (sync, copy, anything else) where a repo version is created. Every time before RepositoryVersion.create() is

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-06-26 Thread Austin Macdonald
@Tanya Tereshchenko > Do I understand correctly that it doesn't cover the sync case and it's > only about explicit repo version creation? > I don't mean that add/remove could not share code with remove duplicate stage. I wanted to point out that we have a problem here (how to remove duplicates)

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-06-25 Thread David Davis
I think I misread your email. If you are saying "newest to associate" and not "newest content unit", I think that would work. @ttereshc, couldn't we de-duplicate the logic by creating a class in the plugin API that RemoveDuplicates uses as well as the add/remove content endpoints in the plugins?

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-06-25 Thread David Davis
I don't think this solution would work in the case of creating a new repository version. Suppose for example you had two content units that collide, one in a repo version and one older unit that a user explicitly wants to add to the repo version. If the latter one is older, then what would happen?

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-06-25 Thread Brian Bouterse
Having a way for units to express their uniqueness per repo sounds good because then more areas of Pulp's code could answer the question: "will I have a duplicate if I add content X to repo_version Y". Let's assume we know that situation is about to occur during sync for example, what do we do

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-06-25 Thread Tatiana Tereshchenko
Do I understand correctly that it doesn't cover the sync case and it's only about explicit repo version creation? So the suggestion is to implement the same logic twice: for sync case - RemoveDuplicates stage and/or maybe some custom stage (e.g. to disallow overlapping paths), and for direct repo

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-06-24 Thread Austin Macdonald
I have a design in mind for solving this problem: 1. Remove POST to RepositoryVersion (no general add/remove endpoint). 2. Add an endpoint to kick off an add/remove task, namespaced by plugin. ie `POST pulp/api/v3/docker/add-remove/` This view can be provided to all plugins by the plugin

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-06-03 Thread Simon Baatz
On Mon, Jun 03, 2019 at 09:11:07AM -0400, David Davis wrote: >@Simon I like the idea behind the repo_key solution you came up with. >Can you be more specific around cases you think that it couldn't >handle? I imagine that plugin writers could use properties or >denormailzation (ie

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-06-03 Thread David Davis
Thanks for raising this issue. The pulp_file also suffers from this problem in that files with duplicate names can be added to repo versions but they probably shouldn't be: https://pulp.plan.io/issues/4028 @Simon I like the idea behind the repo_key solution you came up with. Can you be more

Re: [Pulp-dev] uniqueness constraints within a repository version

2019-05-31 Thread Simon Baatz
On Fri, May 31, 2019 at 01:12:58PM +0200, Tatiana Tereshchenko wrote: >A while ago RemoveDuplicates stage [0] was introduced to solve the >problem of enforcing uniqueness constraints within a repository version >at sync time. >The same problem ought to be solved when content which

[Pulp-dev] uniqueness constraints within a repository version

2019-05-31 Thread Tatiana Tereshchenko
A while ago RemoveDuplicates stage [0] was introduced to solve the problem of enforcing uniqueness constraints within a repository version at sync time. The same problem ought to be solved when content which already exists in Pulp is added to a repository. E.g. Content was uploaded, or content was