Re: [Pulp-dev] the "relative path" problem

2020-10-27 Thread Ina Panova
gt;>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2020 at 12:33 PM Daniel Alley 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> @David Davis   so this proposal would go
>>>>>>>>> something like this, correct?:
>>>>>>>>>
>>>>>>>>> * For the signed metadata / exact mirror use-case we need to store
>>>>>>>>> the repository metadata itself as a content unit inside the
>>>>>>>>> RepositoryVersion anyway (because the hash must be equal)
>>>>>>>>> * Because we have this metadata lying around, we can reference it
>>>>>>>>> at publish time to discover the appropriate 
>>>>>>>>> PublishedArtifact.relative_path
>>>>>>>>>* Create a map of "filename" -> "location_href" and look up the
>>>>>>>>> filename of each RPM package to find the appropriate path
>>>>>>>>>* This should be pretty fast for the RPM plugin since
>>>>>>>>> createrepo_c is doing all the hard work
>>>>>>>>> * Data migration to ensure ContentArtifact.relative_path is only
>>>>>>>>> storing the filename (and I would suggest we also change the name to
>>>>>>>>> "filename")
>>>>>>>>> * If metadata isn't present in the RepositoryVersion, then just
>>>>>>>>> tweak the PublishedArtifact.relative_path so that it uses whichever 
>>>>>>>>> our
>>>>>>>>> default repo layout is
>>>>>>>>>
>>>>>>>>> On Tue, Apr 28, 2020 at 11:41 AM David Davis <
>>>>>>>>> davidda...@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, that's correct. During our meeting we discussed two options:
>>>>>>>>>> the first was to extend RepositoryContent to store relative path per
>>>>>>>>>> ContentArtifact as storing a relative_path per Content won't work for
>>>>>>>>>> multi-Artifact Content units.
>>>>>>>>>>
>>>>>>>>>> An alternative that I pitched was to have plugins (or maybe even
>>>>>>>>>> core someday) store this information outside RepositoryContent and 
>>>>>>>>>> then use
>>>>>>>>>> this information during publishing to set relative_path on
>>>>>>>>>> PublishedArtifacts. We'd have to modify the content app if we wanted 
>>>>>>>>>> to
>>>>>>>>>> support pass through publications but I think asking plugins to use
>>>>>>>>>> published artifacts in this case is warranted. That said, I don't 
>>>>>>>>>> think
>>>>>>>>>> anyone else was keen on this idea though.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg <
>>>>>>>>>> mdell...@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> That is only used for passthrough publication afaik. If you
>>>>>>>>>>> publish each content unit "by hand", you create a new relative path 
>>>>>>>>>>> for
>>>>>>>>>>> each published artifact. That is, why it can be empty and still the 
>>>>>>>>>>> content
>>>>>>>>>>> can be published.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley 
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> We realized in our discussion that the original proposal
>>>>>>>>>>>> described in my email will not work, because "relative_path" 
>>>>>>>>>>>> ultimately
>>>>>>>>>>>> describes the path of the published *artifacts* (not 

Re: [Pulp-dev] the "relative path" problem

2020-10-26 Thread Tatiana Tereshchenko
t;>>>>>* This should be pretty fast for the RPM plugin since
>>>>>>>> createrepo_c is doing all the hard work
>>>>>>>> * Data migration to ensure ContentArtifact.relative_path is only
>>>>>>>> storing the filename (and I would suggest we also change the name to
>>>>>>>> "filename")
>>>>>>>> * If metadata isn't present in the RepositoryVersion, then just
>>>>>>>> tweak the PublishedArtifact.relative_path so that it uses whichever our
>>>>>>>> default repo layout is
>>>>>>>>
>>>>>>>> On Tue, Apr 28, 2020 at 11:41 AM David Davis 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yes, that's correct. During our meeting we discussed two options:
>>>>>>>>> the first was to extend RepositoryContent to store relative path per
>>>>>>>>> ContentArtifact as storing a relative_path per Content won't work for
>>>>>>>>> multi-Artifact Content units.
>>>>>>>>>
>>>>>>>>> An alternative that I pitched was to have plugins (or maybe even
>>>>>>>>> core someday) store this information outside RepositoryContent and 
>>>>>>>>> then use
>>>>>>>>> this information during publishing to set relative_path on
>>>>>>>>> PublishedArtifacts. We'd have to modify the content app if we wanted 
>>>>>>>>> to
>>>>>>>>> support pass through publications but I think asking plugins to use
>>>>>>>>> published artifacts in this case is warranted. That said, I don't 
>>>>>>>>> think
>>>>>>>>> anyone else was keen on this idea though.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg <
>>>>>>>>> mdell...@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> That is only used for passthrough publication afaik. If you
>>>>>>>>>> publish each content unit "by hand", you create a new relative path 
>>>>>>>>>> for
>>>>>>>>>> each published artifact. That is, why it can be empty and still the 
>>>>>>>>>> content
>>>>>>>>>> can be published.
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> We realized in our discussion that the original proposal
>>>>>>>>>>> described in my email will not work, because "relative_path" 
>>>>>>>>>>> ultimately
>>>>>>>>>>> describes the path of the published *artifacts* (not content),
>>>>>>>>>>> and for content types with multiple artifacts, storing this 
>>>>>>>>>>> information in
>>>>>>>>>>> a field on RepositoryContent would not be possible.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley 
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> There is a video call scheduled to discuss this issue tomorrow
>>>>>>>>>>>> (Tuesday April 28th) at 13:30 UTC (please convert to your local 
>>>>>>>>>>>> time).
>>>>>>>>>>>> https://meet.google.com/scy-csbx-qiu
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis <
>>>>>>>>>>>> davidda...@redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I had a chance to think about this some more yesterday and
>>>>>>>>>>>>> wanted to email out my thoughts. I also think that this change 
>>>>>>>>>>>>> sounds scary
>>>>>>>>

Re: [Pulp-dev] the "relative path" problem

2020-10-21 Thread Tatiana Tereshchenko
gt;>>>>> multi-Artifact Content units.
>>>>>>>>
>>>>>>>> An alternative that I pitched was to have plugins (or maybe even
>>>>>>>> core someday) store this information outside RepositoryContent and 
>>>>>>>> then use
>>>>>>>> this information during publishing to set relative_path on
>>>>>>>> PublishedArtifacts. We'd have to modify the content app if we wanted to
>>>>>>>> support pass through publications but I think asking plugins to use
>>>>>>>> published artifacts in this case is warranted. That said, I don't think
>>>>>>>> anyone else was keen on this idea though.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg <
>>>>>>>> mdell...@redhat.com> wrote:
>>>>>>>>
>>>>>>>>> That is only used for passthrough publication afaik. If you
>>>>>>>>> publish each content unit "by hand", you create a new relative path 
>>>>>>>>> for
>>>>>>>>> each published artifact. That is, why it can be empty and still the 
>>>>>>>>> content
>>>>>>>>> can be published.
>>>>>>>>>
>>>>>>>>> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> We realized in our discussion that the original proposal
>>>>>>>>>> described in my email will not work, because "relative_path" 
>>>>>>>>>> ultimately
>>>>>>>>>> describes the path of the published *artifacts* (not content),
>>>>>>>>>> and for content types with multiple artifacts, storing this 
>>>>>>>>>> information in
>>>>>>>>>> a field on RepositoryContent would not be possible.
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> There is a video call scheduled to discuss this issue tomorrow
>>>>>>>>>>> (Tuesday April 28th) at 13:30 UTC (please convert to your local 
>>>>>>>>>>> time).
>>>>>>>>>>> https://meet.google.com/scy-csbx-qiu
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis <
>>>>>>>>>>> davidda...@redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I had a chance to think about this some more yesterday and
>>>>>>>>>>>> wanted to email out my thoughts. I also think that this change 
>>>>>>>>>>>> sounds scary
>>>>>>>>>>>> and will have a big impact on plugin writers so I thought of a 
>>>>>>>>>>>> couple
>>>>>>>>>>>> alternatives:
>>>>>>>>>>>>
>>>>>>>>>>>> First, we could add a relative_path field to RepositoryContent
>>>>>>>>>>>> instead of moving it there. This would be an optional field. It 
>>>>>>>>>>>> would be up
>>>>>>>>>>>> to plugins to manage this field and they would still need to 
>>>>>>>>>>>> populate the
>>>>>>>>>>>> relative_path field on ContentArtifact. But plugins could use this 
>>>>>>>>>>>> optional
>>>>>>>>>>>> field to store relative paths per repository and then use this 
>>>>>>>>>>>> field when
>>>>>>>>>>>> generating publications.
>>>>>>>>>>>>
>>>>>>>>>>>> The second alternative is one that is already laid out in the
>>>>>>>>>>>> original email but to call it out again: it would be to not solve 
>>>>>>>>>&

Re: [Pulp-dev] the "relative path" problem

2020-05-07 Thread Brian Bouterse
d
>>>>> plugins.
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On Thu, Apr 30, 2020 at 12:33 PM Daniel Alley 
>>>>> wrote:
>>>>>
>>>>>> @David Davis   so this proposal would go
>>>>>> something like this, correct?:
>>>>>>
>>>>>> * For the signed metadata / exact mirror use-case we need to store
>>>>>> the repository metadata itself as a content unit inside the
>>>>>> RepositoryVersion anyway (because the hash must be equal)
>>>>>> * Because we have this metadata lying around, we can reference it at
>>>>>> publish time to discover the appropriate PublishedArtifact.relative_path
>>>>>>* Create a map of "filename" -> "location_href" and look up the
>>>>>> filename of each RPM package to find the appropriate path
>>>>>>* This should be pretty fast for the RPM plugin since createrepo_c
>>>>>> is doing all the hard work
>>>>>> * Data migration to ensure ContentArtifact.relative_path is only
>>>>>> storing the filename (and I would suggest we also change the name to
>>>>>> "filename")
>>>>>> * If metadata isn't present in the RepositoryVersion, then just tweak
>>>>>> the PublishedArtifact.relative_path so that it uses whichever our default
>>>>>> repo layout is
>>>>>>
>>>>>> On Tue, Apr 28, 2020 at 11:41 AM David Davis 
>>>>>> wrote:
>>>>>>
>>>>>>> Yes, that's correct. During our meeting we discussed two options:
>>>>>>> the first was to extend RepositoryContent to store relative path per
>>>>>>> ContentArtifact as storing a relative_path per Content won't work for
>>>>>>> multi-Artifact Content units.
>>>>>>>
>>>>>>> An alternative that I pitched was to have plugins (or maybe even
>>>>>>> core someday) store this information outside RepositoryContent and then 
>>>>>>> use
>>>>>>> this information during publishing to set relative_path on
>>>>>>> PublishedArtifacts. We'd have to modify the content app if we wanted to
>>>>>>> support pass through publications but I think asking plugins to use
>>>>>>> published artifacts in this case is warranted. That said, I don't think
>>>>>>> anyone else was keen on this idea though.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg <
>>>>>>> mdell...@redhat.com> wrote:
>>>>>>>
>>>>>>>> That is only used for passthrough publication afaik. If you publish
>>>>>>>> each content unit "by hand", you create a new relative path for each
>>>>>>>> published artifact. That is, why it can be empty and still the content 
>>>>>>>> can
>>>>>>>> be published.
>>>>>>>>
>>>>>>>> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> We realized in our discussion that the original proposal described
>>>>>>>>> in my email will not work, because "relative_path" ultimately 
>>>>>>>>> describes the
>>>>>>>>> path of the published *artifacts* (not content), and for content
>>>>>>>>> types with multiple artifacts, storing this information in a field on
>>>>>>>>> RepositoryContent would not be possible.
>>>>>>>>>
>>>>>>>>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> There is a video call scheduled to discuss this issue tomorrow
>>>>>>>>>> (Tuesday April 28th) at 13:30 UTC (please convert to your local 
>>>>>>>>>> time).
>>>>>>>>>> https://meet.google.com/scy-csbx-qiu
>>>>>>>>>>
>>>>>>>>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis <
>>>>>>>>>

Re: [Pulp-dev] the "relative path" problem

2020-05-07 Thread Matthias Dellweg
 ContentArtifact.relative_path is only
>>>>> storing the filename (and I would suggest we also change the name to
>>>>> "filename")
>>>>> * If metadata isn't present in the RepositoryVersion, then just tweak
>>>>> the PublishedArtifact.relative_path so that it uses whichever our default
>>>>> repo layout is
>>>>>
>>>>> On Tue, Apr 28, 2020 at 11:41 AM David Davis 
>>>>> wrote:
>>>>>
>>>>>> Yes, that's correct. During our meeting we discussed two options: the
>>>>>> first was to extend RepositoryContent to store relative path per
>>>>>> ContentArtifact as storing a relative_path per Content won't work for
>>>>>> multi-Artifact Content units.
>>>>>>
>>>>>> An alternative that I pitched was to have plugins (or maybe even core
>>>>>> someday) store this information outside RepositoryContent and then use 
>>>>>> this
>>>>>> information during publishing to set relative_path on PublishedArtifacts.
>>>>>> We'd have to modify the content app if we wanted to support pass through
>>>>>> publications but I think asking plugins to use published artifacts in 
>>>>>> this
>>>>>> case is warranted. That said, I don't think anyone else was keen on this
>>>>>> idea though.
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg <
>>>>>> mdell...@redhat.com> wrote:
>>>>>>
>>>>>>> That is only used for passthrough publication afaik. If you publish
>>>>>>> each content unit "by hand", you create a new relative path for each
>>>>>>> published artifact. That is, why it can be empty and still the content 
>>>>>>> can
>>>>>>> be published.
>>>>>>>
>>>>>>> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> We realized in our discussion that the original proposal described
>>>>>>>> in my email will not work, because "relative_path" ultimately 
>>>>>>>> describes the
>>>>>>>> path of the published *artifacts* (not content), and for content
>>>>>>>> types with multiple artifacts, storing this information in a field on
>>>>>>>> RepositoryContent would not be possible.
>>>>>>>>
>>>>>>>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> There is a video call scheduled to discuss this issue tomorrow
>>>>>>>>> (Tuesday April 28th) at 13:30 UTC (please convert to your local time).
>>>>>>>>> https://meet.google.com/scy-csbx-qiu
>>>>>>>>>
>>>>>>>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I had a chance to think about this some more yesterday and wanted
>>>>>>>>>> to email out my thoughts. I also think that this change sounds scary 
>>>>>>>>>> and
>>>>>>>>>> will have a big impact on plugin writers so I thought of a couple
>>>>>>>>>> alternatives:
>>>>>>>>>>
>>>>>>>>>> First, we could add a relative_path field to RepositoryContent
>>>>>>>>>> instead of moving it there. This would be an optional field. It 
>>>>>>>>>> would be up
>>>>>>>>>> to plugins to manage this field and they would still need to 
>>>>>>>>>> populate the
>>>>>>>>>> relative_path field on ContentArtifact. But plugins could use this 
>>>>>>>>>> optional
>>>>>>>>>> field to store relative paths per repository and then use this field 
>>>>>>>>>> when
>>>>>>>>>> generating publications.
>>>>>>>>>>
>>>>>>>>>> The second alternative is one that is already laid out in the
>>>

Re: [Pulp-dev] the "relative path" problem

2020-05-06 Thread Dennis Kliban
t;>>
>>>>> An alternative that I pitched was to have plugins (or maybe even core
>>>>> someday) store this information outside RepositoryContent and then use 
>>>>> this
>>>>> information during publishing to set relative_path on PublishedArtifacts.
>>>>> We'd have to modify the content app if we wanted to support pass through
>>>>> publications but I think asking plugins to use published artifacts in this
>>>>> case is warranted. That said, I don't think anyone else was keen on this
>>>>> idea though.
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg 
>>>>> wrote:
>>>>>
>>>>>> That is only used for passthrough publication afaik. If you publish
>>>>>> each content unit "by hand", you create a new relative path for each
>>>>>> published artifact. That is, why it can be empty and still the content 
>>>>>> can
>>>>>> be published.
>>>>>>
>>>>>> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley 
>>>>>> wrote:
>>>>>>
>>>>>>> We realized in our discussion that the original proposal described
>>>>>>> in my email will not work, because "relative_path" ultimately describes 
>>>>>>> the
>>>>>>> path of the published *artifacts* (not content), and for content
>>>>>>> types with multiple artifacts, storing this information in a field on
>>>>>>> RepositoryContent would not be possible.
>>>>>>>
>>>>>>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> There is a video call scheduled to discuss this issue tomorrow
>>>>>>>> (Tuesday April 28th) at 13:30 UTC (please convert to your local time).
>>>>>>>> https://meet.google.com/scy-csbx-qiu
>>>>>>>>
>>>>>>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I had a chance to think about this some more yesterday and wanted
>>>>>>>>> to email out my thoughts. I also think that this change sounds scary 
>>>>>>>>> and
>>>>>>>>> will have a big impact on plugin writers so I thought of a couple
>>>>>>>>> alternatives:
>>>>>>>>>
>>>>>>>>> First, we could add a relative_path field to RepositoryContent
>>>>>>>>> instead of moving it there. This would be an optional field. It would 
>>>>>>>>> be up
>>>>>>>>> to plugins to manage this field and they would still need to populate 
>>>>>>>>> the
>>>>>>>>> relative_path field on ContentArtifact. But plugins could use this 
>>>>>>>>> optional
>>>>>>>>> field to store relative paths per repository and then use this field 
>>>>>>>>> when
>>>>>>>>> generating publications.
>>>>>>>>>
>>>>>>>>> The second alternative is one that is already laid out in the
>>>>>>>>> original email but to call it out again: it would be to not solve 
>>>>>>>>> this in
>>>>>>>>> pulpcore. RPM would create its own object that would map content in a
>>>>>>>>> repository to relative_paths.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp  wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I am not currently very well versed in the classes involved, but
>>>>>>>>>> moving relative_path around sounds slightly scary with the potential 
>>>>>>>>>> to
>>>>>>>>>> break things.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>&

Re: [Pulp-dev] the "relative path" problem

2020-05-04 Thread Dennis Kliban
;> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley 
>>>>>> wrote:
>>>>>>
>>>>>>> There is a video call scheduled to discuss this issue tomorrow
>>>>>>> (Tuesday April 28th) at 13:30 UTC (please convert to your local time).
>>>>>>> https://meet.google.com/scy-csbx-qiu
>>>>>>>
>>>>>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I had a chance to think about this some more yesterday and wanted
>>>>>>>> to email out my thoughts. I also think that this change sounds scary 
>>>>>>>> and
>>>>>>>> will have a big impact on plugin writers so I thought of a couple
>>>>>>>> alternatives:
>>>>>>>>
>>>>>>>> First, we could add a relative_path field to RepositoryContent
>>>>>>>> instead of moving it there. This would be an optional field. It would 
>>>>>>>> be up
>>>>>>>> to plugins to manage this field and they would still need to populate 
>>>>>>>> the
>>>>>>>> relative_path field on ContentArtifact. But plugins could use this 
>>>>>>>> optional
>>>>>>>> field to store relative paths per repository and then use this field 
>>>>>>>> when
>>>>>>>> generating publications.
>>>>>>>>
>>>>>>>> The second alternative is one that is already laid out in the
>>>>>>>> original email but to call it out again: it would be to not solve this 
>>>>>>>> in
>>>>>>>> pulpcore. RPM would create its own object that would map content in a
>>>>>>>> repository to relative_paths.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp  wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am not currently very well versed in the classes involved, but
>>>>>>>>> moving relative_path around sounds slightly scary with the potential 
>>>>>>>>> to
>>>>>>>>> break things.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As such, I would be interested to be kept in the loop as this
>>>>>>>>> moves forward. (Mailing list once there is some movement is entirely
>>>>>>>>> sufficient )
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Quirin Pamp
>>>>>>>>> --
>>>>>>>>> *From:* pulp-dev-boun...@redhat.com 
>>>>>>>>> on behalf of Ina Panova 
>>>>>>>>> *Sent:* 21 April 2020 14:07:13
>>>>>>>>> *To:* Daniel Alley 
>>>>>>>>> *Cc:* Pulp-dev 
>>>>>>>>> *Subject:* Re: [Pulp-dev] the "relative path" problem
>>>>>>>>>
>>>>>>>>> Daniel,
>>>>>>>>>
>>>>>>>>> how about setting up a meeting and brainstorm the alternatives,
>>>>>>>>> pros/cons there?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Ina Panova
>>>>>>>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>>>>>>>
>>>>>>>>> "Do not go where the path may lead,
>>>>>>>>>  go instead where there is no path and leave a trail."
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Bump, this item needs to move forwards soon.  Does anyone have any
>>>>>>>>> thoug

Re: [Pulp-dev] the "relative path" problem

2020-04-30 Thread Daniel Alley
ail out my thoughts. I also think that this change sounds scary and 
>>>>>>> will
>>>>>>> have a big impact on plugin writers so I thought of a couple 
>>>>>>> alternatives:
>>>>>>>
>>>>>>> First, we could add a relative_path field to RepositoryContent
>>>>>>> instead of moving it there. This would be an optional field. It would 
>>>>>>> be up
>>>>>>> to plugins to manage this field and they would still need to populate 
>>>>>>> the
>>>>>>> relative_path field on ContentArtifact. But plugins could use this 
>>>>>>> optional
>>>>>>> field to store relative paths per repository and then use this field 
>>>>>>> when
>>>>>>> generating publications.
>>>>>>>
>>>>>>> The second alternative is one that is already laid out in the
>>>>>>> original email but to call it out again: it would be to not solve this 
>>>>>>> in
>>>>>>> pulpcore. RPM would create its own object that would map content in a
>>>>>>> repository to relative_paths.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp  wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not currently very well versed in the classes involved, but
>>>>>>>> moving relative_path around sounds slightly scary with the potential to
>>>>>>>> break things.
>>>>>>>>
>>>>>>>>
>>>>>>>> As such, I would be interested to be kept in the loop as this moves
>>>>>>>> forward. (Mailing list once there is some movement is entirely 
>>>>>>>> sufficient
>>>>>>>> )
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Quirin Pamp
>>>>>>>> --
>>>>>>>> *From:* pulp-dev-boun...@redhat.com 
>>>>>>>> on behalf of Ina Panova 
>>>>>>>> *Sent:* 21 April 2020 14:07:13
>>>>>>>> *To:* Daniel Alley 
>>>>>>>> *Cc:* Pulp-dev 
>>>>>>>> *Subject:* Re: [Pulp-dev] the "relative path" problem
>>>>>>>>
>>>>>>>> Daniel,
>>>>>>>>
>>>>>>>> how about setting up a meeting and brainstorm the alternatives,
>>>>>>>> pros/cons there?
>>>>>>>>
>>>>>>>>
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Ina Panova
>>>>>>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>>>>>>
>>>>>>>> "Do not go where the path may lead,
>>>>>>>>  go instead where there is no path and leave a trail."
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Bump, this item needs to move forwards soon.  Does anyone have any
>>>>>>>> thoughts?
>>>>>>>>
>>>>>>>> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I'd like to add one more question to this topic. Do you think it is
>>>>>>>> a blocker for PRs [0] & [1] as by testing [2] this features I haven't 
>>>>>>>> run
>>>>>>>> into real world example where two really same name packages appears.
>>>>>>>> I think this is a 'must have' feature but until we solve/decide it
>>>>>>>> we can have two features working may with warning in docs for users 
>>>>>>>> that
>>>>>>>> can happen in some 'special' repositories.
>>>>>>>>
>>>>>>>> To follow topic directly I like

Re: [Pulp-dev] the "relative path" problem

2020-04-30 Thread David Davis
hat is already laid out in the
>>>>>> original email but to call it out again: it would be to not solve this in
>>>>>> pulpcore. RPM would create its own object that would map content in a
>>>>>> repository to relative_paths.
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp  wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>> I am not currently very well versed in the classes involved, but
>>>>>>> moving relative_path around sounds slightly scary with the potential to
>>>>>>> break things.
>>>>>>>
>>>>>>>
>>>>>>> As such, I would be interested to be kept in the loop as this moves
>>>>>>> forward. (Mailing list once there is some movement is entirely 
>>>>>>> sufficient
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Quirin Pamp
>>>>>>> --
>>>>>>> *From:* pulp-dev-boun...@redhat.com 
>>>>>>> on behalf of Ina Panova 
>>>>>>> *Sent:* 21 April 2020 14:07:13
>>>>>>> *To:* Daniel Alley 
>>>>>>> *Cc:* Pulp-dev 
>>>>>>> *Subject:* Re: [Pulp-dev] the "relative path" problem
>>>>>>>
>>>>>>> Daniel,
>>>>>>>
>>>>>>> how about setting up a meeting and brainstorm the alternatives,
>>>>>>> pros/cons there?
>>>>>>>
>>>>>>>
>>>>>>> 
>>>>>>> Regards,
>>>>>>>
>>>>>>> Ina Panova
>>>>>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>>>>>
>>>>>>> "Do not go where the path may lead,
>>>>>>>  go instead where there is no path and leave a trail."
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley 
>>>>>>> wrote:
>>>>>>>
>>>>>>> Bump, this item needs to move forwards soon.  Does anyone have any
>>>>>>> thoughts?
>>>>>>>
>>>>>>> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka 
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>> I'd like to add one more question to this topic. Do you think it is
>>>>>>> a blocker for PRs [0] & [1] as by testing [2] this features I haven't 
>>>>>>> run
>>>>>>> into real world example where two really same name packages appears.
>>>>>>> I think this is a 'must have' feature but until we solve/decide it
>>>>>>> we can have two features working may with warning in docs for users that
>>>>>>> can happen in some 'special' repositories.
>>>>>>>
>>>>>>> To follow topic directly I like proposed move to 'RepositoryContent'
>>>>>>> and add it to its uniqueness constraint (if I understand well).
>>>>>>>
>>>>>>> [0] https://github.com/pulp/pulp_rpm/pull/1657
>>>>>>> [1] https://github.com/pulp/pulp_rpm/pull/1642
>>>>>>> [2] tested with centos 7, 8, opensuse and SLE repositories
>>>>>>>
>>>>>>> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley 
>>>>>>> wrote:
>>>>>>>
>>>>>>> We'd like to start a discussion on the "relative path problem"
>>>>>>> identified recently.
>>>>>>> Problem:
>>>>>>>
>>>>>>> Currently, a relative_path is tied to content in Pulp. This means
>>>>>>> that if a content unit exists in two places within a repository or 
>>>>>>> across
>>>>>>> repositories, it has to be stored as two separate content units. This
>>>>>>> creates redundant data and potential confusion for users.
>>>>>>>
>>>>>>> As a specific example, we need to support mirroring content in
>>>>>>> pulp_rpm &

Re: [Pulp-dev] the "relative path" problem

2020-04-30 Thread Daniel Alley
@David Davis   so this proposal would go something
like this, correct?:

* For the signed metadata / exact mirror use-case we need to store the
repository metadata itself as a content unit inside the RepositoryVersion
anyway (because the hash must be equal)
* Because we have this metadata lying around, we can reference it at
publish time to discover the appropriate PublishedArtifact.relative_path
   * Create a map of "filename" -> "location_href" and look up the filename
of each RPM package to find the appropriate path
   * This should be pretty fast for the RPM plugin since createrepo_c is
doing all the hard work
* Data migration to ensure ContentArtifact.relative_path is only storing
the filename (and I would suggest we also change the name to "filename")
* If metadata isn't present in the RepositoryVersion, then just tweak the
PublishedArtifact.relative_path so that it uses whichever our default repo
layout is

On Tue, Apr 28, 2020 at 11:41 AM David Davis  wrote:

> Yes, that's correct. During our meeting we discussed two options: the
> first was to extend RepositoryContent to store relative path per
> ContentArtifact as storing a relative_path per Content won't work for
> multi-Artifact Content units.
>
> An alternative that I pitched was to have plugins (or maybe even core
> someday) store this information outside RepositoryContent and then use this
> information during publishing to set relative_path on PublishedArtifacts.
> We'd have to modify the content app if we wanted to support pass through
> publications but I think asking plugins to use published artifacts in this
> case is warranted. That said, I don't think anyone else was keen on this
> idea though.
>
> David
>
>
> On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg 
> wrote:
>
>> That is only used for passthrough publication afaik. If you publish each
>> content unit "by hand", you create a new relative path for each published
>> artifact. That is, why it can be empty and still the content can be
>> published.
>>
>> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley  wrote:
>>
>>> We realized in our discussion that the original proposal described in my
>>> email will not work, because "relative_path" ultimately describes the path
>>> of the published *artifacts* (not content), and for content types with
>>> multiple artifacts, storing this information in a field on
>>> RepositoryContent would not be possible.
>>>
>>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley  wrote:
>>>
>>>> There is a video call scheduled to discuss this issue tomorrow (Tuesday
>>>> April 28th) at 13:30 UTC (please convert to your local time).
>>>> https://meet.google.com/scy-csbx-qiu
>>>>
>>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis 
>>>> wrote:
>>>>
>>>>> I had a chance to think about this some more yesterday and wanted to
>>>>> email out my thoughts. I also think that this change sounds scary and will
>>>>> have a big impact on plugin writers so I thought of a couple alternatives:
>>>>>
>>>>> First, we could add a relative_path field to RepositoryContent instead
>>>>> of moving it there. This would be an optional field. It would be up to
>>>>> plugins to manage this field and they would still need to populate the
>>>>> relative_path field on ContentArtifact. But plugins could use this 
>>>>> optional
>>>>> field to store relative paths per repository and then use this field when
>>>>> generating publications.
>>>>>
>>>>> The second alternative is one that is already laid out in the original
>>>>> email but to call it out again: it would be to not solve this in pulpcore.
>>>>> RPM would create its own object that would map content in a repository to
>>>>> relative_paths.
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp  wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> I am not currently very well versed in the classes involved, but
>>>>>> moving relative_path around sounds slightly scary with the potential to
>>>>>> break things.
>>>>>>
>>>>>>
>>>>>> As such, I would be interested to be kept in the loop as this moves
>>>>>> forward. (Mailing list once there is some movement is entirely sufficient
>>>>>> )
&g

Re: [Pulp-dev] the "relative path" problem

2020-04-28 Thread David Davis
Yes, that's correct. During our meeting we discussed two options: the first
was to extend RepositoryContent to store relative path per ContentArtifact
as storing a relative_path per Content won't work for multi-Artifact
Content units.

An alternative that I pitched was to have plugins (or maybe even core
someday) store this information outside RepositoryContent and then use this
information during publishing to set relative_path on PublishedArtifacts.
We'd have to modify the content app if we wanted to support pass through
publications but I think asking plugins to use published artifacts in this
case is warranted. That said, I don't think anyone else was keen on this
idea though.

David


On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg 
wrote:

> That is only used for passthrough publication afaik. If you publish each
> content unit "by hand", you create a new relative path for each published
> artifact. That is, why it can be empty and still the content can be
> published.
>
> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley  wrote:
>
>> We realized in our discussion that the original proposal described in my
>> email will not work, because "relative_path" ultimately describes the path
>> of the published *artifacts* (not content), and for content types with
>> multiple artifacts, storing this information in a field on
>> RepositoryContent would not be possible.
>>
>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley  wrote:
>>
>>> There is a video call scheduled to discuss this issue tomorrow (Tuesday
>>> April 28th) at 13:30 UTC (please convert to your local time).
>>> https://meet.google.com/scy-csbx-qiu
>>>
>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis 
>>> wrote:
>>>
>>>> I had a chance to think about this some more yesterday and wanted to
>>>> email out my thoughts. I also think that this change sounds scary and will
>>>> have a big impact on plugin writers so I thought of a couple alternatives:
>>>>
>>>> First, we could add a relative_path field to RepositoryContent instead
>>>> of moving it there. This would be an optional field. It would be up to
>>>> plugins to manage this field and they would still need to populate the
>>>> relative_path field on ContentArtifact. But plugins could use this optional
>>>> field to store relative paths per repository and then use this field when
>>>> generating publications.
>>>>
>>>> The second alternative is one that is already laid out in the original
>>>> email but to call it out again: it would be to not solve this in pulpcore.
>>>> RPM would create its own object that would map content in a repository to
>>>> relative_paths.
>>>>
>>>> David
>>>>
>>>>
>>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp  wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> I am not currently very well versed in the classes involved, but
>>>>> moving relative_path around sounds slightly scary with the potential to
>>>>> break things.
>>>>>
>>>>>
>>>>> As such, I would be interested to be kept in the loop as this moves
>>>>> forward. (Mailing list once there is some movement is entirely sufficient
>>>>> )
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Quirin Pamp
>>>>> --
>>>>> *From:* pulp-dev-boun...@redhat.com  on
>>>>> behalf of Ina Panova 
>>>>> *Sent:* 21 April 2020 14:07:13
>>>>> *To:* Daniel Alley 
>>>>> *Cc:* Pulp-dev 
>>>>> *Subject:* Re: [Pulp-dev] the "relative path" problem
>>>>>
>>>>> Daniel,
>>>>>
>>>>> how about setting up a meeting and brainstorm the alternatives,
>>>>> pros/cons there?
>>>>>
>>>>>
>>>>> 
>>>>> Regards,
>>>>>
>>>>> Ina Panova
>>>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>>>
>>>>> "Do not go where the path may lead,
>>>>>  go instead where there is no path and leave a trail."
>>>>>
>>>>>
>>>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley 
>>>>> wrote:
>>>>>
>>>>> Bump, this item needs to move forwards soon.  Does anyone have any
>>>>> though

Re: [Pulp-dev] the "relative path" problem

2020-04-28 Thread Matthias Dellweg
That is only used for passthrough publication afaik. If you publish each
content unit "by hand", you create a new relative path for each published
artifact. That is, why it can be empty and still the content can be
published.

On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley  wrote:

> We realized in our discussion that the original proposal described in my
> email will not work, because "relative_path" ultimately describes the path
> of the published *artifacts* (not content), and for content types with
> multiple artifacts, storing this information in a field on
> RepositoryContent would not be possible.
>
> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley  wrote:
>
>> There is a video call scheduled to discuss this issue tomorrow (Tuesday
>> April 28th) at 13:30 UTC (please convert to your local time).
>> https://meet.google.com/scy-csbx-qiu
>>
>> On Sat, Apr 25, 2020 at 7:02 AM David Davis 
>> wrote:
>>
>>> I had a chance to think about this some more yesterday and wanted to
>>> email out my thoughts. I also think that this change sounds scary and will
>>> have a big impact on plugin writers so I thought of a couple alternatives:
>>>
>>> First, we could add a relative_path field to RepositoryContent instead
>>> of moving it there. This would be an optional field. It would be up to
>>> plugins to manage this field and they would still need to populate the
>>> relative_path field on ContentArtifact. But plugins could use this optional
>>> field to store relative paths per repository and then use this field when
>>> generating publications.
>>>
>>> The second alternative is one that is already laid out in the original
>>> email but to call it out again: it would be to not solve this in pulpcore.
>>> RPM would create its own object that would map content in a repository to
>>> relative_paths.
>>>
>>> David
>>>
>>>
>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp  wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> I am not currently very well versed in the classes involved, but moving
>>>> relative_path around sounds slightly scary with the potential to break
>>>> things.
>>>>
>>>>
>>>> As such, I would be interested to be kept in the loop as this moves
>>>> forward. (Mailing list once there is some movement is entirely sufficient
>>>> )
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Quirin Pamp
>>>> --
>>>> *From:* pulp-dev-boun...@redhat.com  on
>>>> behalf of Ina Panova 
>>>> *Sent:* 21 April 2020 14:07:13
>>>> *To:* Daniel Alley 
>>>> *Cc:* Pulp-dev 
>>>> *Subject:* Re: [Pulp-dev] the "relative path" problem
>>>>
>>>> Daniel,
>>>>
>>>> how about setting up a meeting and brainstorm the alternatives,
>>>> pros/cons there?
>>>>
>>>>
>>>> 
>>>> Regards,
>>>>
>>>> Ina Panova
>>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>>
>>>> "Do not go where the path may lead,
>>>>  go instead where there is no path and leave a trail."
>>>>
>>>>
>>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley  wrote:
>>>>
>>>> Bump, this item needs to move forwards soon.  Does anyone have any
>>>> thoughts?
>>>>
>>>> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka  wrote:
>>>>
>>>> Hi,
>>>> I'd like to add one more question to this topic. Do you think it is a
>>>> blocker for PRs [0] & [1] as by testing [2] this features I haven't run
>>>> into real world example where two really same name packages appears.
>>>> I think this is a 'must have' feature but until we solve/decide it we
>>>> can have two features working may with warning in docs for users that can
>>>> happen in some 'special' repositories.
>>>>
>>>> To follow topic directly I like proposed move to 'RepositoryContent'
>>>> and add it to its uniqueness constraint (if I understand well).
>>>>
>>>> [0] https://github.com/pulp/pulp_rpm/pull/1657
>>>> [1] https://github.com/pulp/pulp_rpm/pull/1642
>>>> [2] tested with centos 7, 8, opensuse and SLE repositories
>>>>
>>>> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley  wrote:
>>>>
>>

Re: [Pulp-dev] the "relative path" problem

2020-04-28 Thread Daniel Alley
We realized in our discussion that the original proposal described in my
email will not work, because "relative_path" ultimately describes the path
of the published *artifacts* (not content), and for content types with
multiple artifacts, storing this information in a field on
RepositoryContent would not be possible.

On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley  wrote:

> There is a video call scheduled to discuss this issue tomorrow (Tuesday
> April 28th) at 13:30 UTC (please convert to your local time).
> https://meet.google.com/scy-csbx-qiu
>
> On Sat, Apr 25, 2020 at 7:02 AM David Davis  wrote:
>
>> I had a chance to think about this some more yesterday and wanted to
>> email out my thoughts. I also think that this change sounds scary and will
>> have a big impact on plugin writers so I thought of a couple alternatives:
>>
>> First, we could add a relative_path field to RepositoryContent instead of
>> moving it there. This would be an optional field. It would be up to plugins
>> to manage this field and they would still need to populate the
>> relative_path field on ContentArtifact. But plugins could use this optional
>> field to store relative paths per repository and then use this field when
>> generating publications.
>>
>> The second alternative is one that is already laid out in the original
>> email but to call it out again: it would be to not solve this in pulpcore.
>> RPM would create its own object that would map content in a repository to
>> relative_paths.
>>
>> David
>>
>>
>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp  wrote:
>>
>>> Hi,
>>>
>>>
>>> I am not currently very well versed in the classes involved, but moving
>>> relative_path around sounds slightly scary with the potential to break
>>> things.
>>>
>>>
>>> As such, I would be interested to be kept in the loop as this moves
>>> forward. (Mailing list once there is some movement is entirely sufficient
>>> )
>>>
>>>
>>> Thanks,
>>>
>>> Quirin Pamp
>>> --
>>> *From:* pulp-dev-boun...@redhat.com  on
>>> behalf of Ina Panova 
>>> *Sent:* 21 April 2020 14:07:13
>>> *To:* Daniel Alley 
>>> *Cc:* Pulp-dev 
>>> *Subject:* Re: [Pulp-dev] the "relative path" problem
>>>
>>> Daniel,
>>>
>>> how about setting up a meeting and brainstorm the alternatives,
>>> pros/cons there?
>>>
>>>
>>> 
>>> Regards,
>>>
>>> Ina Panova
>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>
>>> "Do not go where the path may lead,
>>>  go instead where there is no path and leave a trail."
>>>
>>>
>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley  wrote:
>>>
>>> Bump, this item needs to move forwards soon.  Does anyone have any
>>> thoughts?
>>>
>>> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka  wrote:
>>>
>>> Hi,
>>> I'd like to add one more question to this topic. Do you think it is a
>>> blocker for PRs [0] & [1] as by testing [2] this features I haven't run
>>> into real world example where two really same name packages appears.
>>> I think this is a 'must have' feature but until we solve/decide it we
>>> can have two features working may with warning in docs for users that can
>>> happen in some 'special' repositories.
>>>
>>> To follow topic directly I like proposed move to 'RepositoryContent' and
>>> add it to its uniqueness constraint (if I understand well).
>>>
>>> [0] https://github.com/pulp/pulp_rpm/pull/1657
>>> [1] https://github.com/pulp/pulp_rpm/pull/1642
>>> [2] tested with centos 7, 8, opensuse and SLE repositories
>>>
>>> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley  wrote:
>>>
>>> We'd like to start a discussion on the "relative path problem"
>>> identified recently.
>>> Problem:
>>>
>>> Currently, a relative_path is tied to content in Pulp. This means that
>>> if a content unit exists in two places within a repository or across
>>> repositories, it has to be stored as two separate content units. This
>>> creates redundant data and potential confusion for users.
>>>
>>> As a specific example, we need to support mirroring content in pulp_rpm
>>> <https://pulp.plan.io/issues/6353>. Currently, for each location at
>>> which a single packa

Re: [Pulp-dev] the "relative path" problem

2020-04-27 Thread Daniel Alley
There is a video call scheduled to discuss this issue tomorrow (Tuesday
April 28th) at 13:30 UTC (please convert to your local time).
https://meet.google.com/scy-csbx-qiu

On Sat, Apr 25, 2020 at 7:02 AM David Davis  wrote:

> I had a chance to think about this some more yesterday and wanted to email
> out my thoughts. I also think that this change sounds scary and will have a
> big impact on plugin writers so I thought of a couple alternatives:
>
> First, we could add a relative_path field to RepositoryContent instead of
> moving it there. This would be an optional field. It would be up to plugins
> to manage this field and they would still need to populate the
> relative_path field on ContentArtifact. But plugins could use this optional
> field to store relative paths per repository and then use this field when
> generating publications.
>
> The second alternative is one that is already laid out in the original
> email but to call it out again: it would be to not solve this in pulpcore.
> RPM would create its own object that would map content in a repository to
> relative_paths.
>
> David
>
>
> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp  wrote:
>
>> Hi,
>>
>>
>> I am not currently very well versed in the classes involved, but moving
>> relative_path around sounds slightly scary with the potential to break
>> things.
>>
>>
>> As such, I would be interested to be kept in the loop as this moves
>> forward. (Mailing list once there is some movement is entirely sufficient
>> )
>>
>>
>> Thanks,
>>
>> Quirin Pamp
>> ------
>> *From:* pulp-dev-boun...@redhat.com  on
>> behalf of Ina Panova 
>> *Sent:* 21 April 2020 14:07:13
>> *To:* Daniel Alley 
>> *Cc:* Pulp-dev 
>> *Subject:* Re: [Pulp-dev] the "relative path" problem
>>
>> Daniel,
>>
>> how about setting up a meeting and brainstorm the alternatives, pros/cons
>> there?
>>
>>
>> 
>> Regards,
>>
>> Ina Panova
>> Senior Software Engineer| Pulp| Red Hat Inc.
>>
>> "Do not go where the path may lead,
>>  go instead where there is no path and leave a trail."
>>
>>
>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley  wrote:
>>
>> Bump, this item needs to move forwards soon.  Does anyone have any
>> thoughts?
>>
>> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka  wrote:
>>
>> Hi,
>> I'd like to add one more question to this topic. Do you think it is a
>> blocker for PRs [0] & [1] as by testing [2] this features I haven't run
>> into real world example where two really same name packages appears.
>> I think this is a 'must have' feature but until we solve/decide it we can
>> have two features working may with warning in docs for users that can
>> happen in some 'special' repositories.
>>
>> To follow topic directly I like proposed move to 'RepositoryContent' and
>> add it to its uniqueness constraint (if I understand well).
>>
>> [0] https://github.com/pulp/pulp_rpm/pull/1657
>> [1] https://github.com/pulp/pulp_rpm/pull/1642
>> [2] tested with centos 7, 8, opensuse and SLE repositories
>>
>> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley  wrote:
>>
>> We'd like to start a discussion on the "relative path problem" identified
>> recently.
>> Problem:
>>
>> Currently, a relative_path is tied to content in Pulp. This means that if
>> a content unit exists in two places within a repository or across
>> repositories, it has to be stored as two separate content units. This
>> creates redundant data and potential confusion for users.
>>
>> As a specific example, we need to support mirroring content in pulp_rpm
>> <https://pulp.plan.io/issues/6353>. Currently, for each location at
>> which a single package is stored, we’ll need to create a content unit. We
>> could end up with several records representing a single package. Users may
>> be confused about why they see multiple records for a package and they may
>> have trouble for example deciding which content unit to copy.
>> Proposed Solution:
>>
>> Move “relative_path” from its current location on ContentArtifact, to
>> RepositoryContent. This will require a sizable data migration. It is
>> possibly the case that in rare cases, repository versions may change
>> slightly due to deduplication.
>>
>> A repository-version-wide uniqueness constraint will be present on
>> “relative_path”, independently of any other repository uniquness
>> constraints (repo_key_fields) defined by 

Re: [Pulp-dev] the "relative path" problem

2020-04-21 Thread Quirin Pamp
Hi,


I am not currently very well versed in the classes involved, but moving 
relative_path around sounds slightly scary with the potential to break things.


As such, I would be interested to be kept in the loop as this moves forward. 
(Mailing list once there is some movement is entirely sufficient )


Thanks,

Quirin Pamp


From: pulp-dev-boun...@redhat.com  on behalf of 
Ina Panova 
Sent: 21 April 2020 14:07:13
To: Daniel Alley 
Cc: Pulp-dev 
Subject: Re: [Pulp-dev] the "relative path" problem

Daniel,

how about setting up a meeting and brainstorm the alternatives, pros/cons there?



Regards,

Ina Panova
Senior Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley 
mailto:dal...@redhat.com>> wrote:
Bump, this item needs to move forwards soon.  Does anyone have any thoughts?

On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka 
mailto:ppi...@redhat.com>> wrote:
Hi,
I'd like to add one more question to this topic. Do you think it is a blocker 
for PRs [0] & [1] as by testing [2] this features I haven't run into real world 
example where two really same name packages appears.
I think this is a 'must have' feature but until we solve/decide it we can have 
two features working may with warning in docs for users that can happen in some 
'special' repositories.

To follow topic directly I like proposed move to 'RepositoryContent' and add it 
to its uniqueness constraint (if I understand well).

[0] https://github.com/pulp/pulp_rpm/pull/1657
[1] https://github.com/pulp/pulp_rpm/pull/1642
[2] tested with centos 7, 8, opensuse and SLE repositories

On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley 
mailto:dal...@redhat.com>> wrote:
We'd like to start a discussion on the "relative path problem" identified 
recently.
Problem:

Currently, a relative_path is tied to content in Pulp. This means that if a 
content unit exists in two places within a repository or across repositories, 
it has to be stored as two separate content units. This creates redundant data 
and potential confusion for users.

As a specific example, we need to support mirroring content in 
pulp_rpm<https://pulp.plan.io/issues/6353>. Currently, for each location at 
which a single package is stored, we’ll need to create a content unit. We could 
end up with several records representing a single package. Users may be 
confused about why they see multiple records for a package and they may have 
trouble for example deciding which content unit to copy.

Proposed Solution:

Move “relative_path” from its current location on ContentArtifact, to 
RepositoryContent. This will require a sizable data migration. It is possibly 
the case that in rare cases, repository versions may change slightly due to 
deduplication.

A repository-version-wide uniqueness constraint will be present on 
“relative_path”, independently of any other repository uniquness constraints 
(repo_key_fields) defined by the plugin writer.

Modify the Stages API so that the relative_path can be processed in the correct 
location – instead of “DeclarativeArtifact” it will likely need to go on 
“DeclarativeContent”

Remove “location_href” from the RPM Package content model – it was never a true 
part of the RPM (file) metadata, it is derived from the repository metadata. So 
storing it as a part of the Content unit doesn’t entirely make sense.

Alternatives

In most cases, a content unit will have a single relative path for a content 
unit. Creating a general solution to solve a one-off problem is usually not a 
good idea. As an alternative, we could look at another solution for mirroring 
content. One example might be to create a new object (e.g. 
RpmRepoMirrorContentMapping) that maps content to specific paths within a repo 
or repo version.

Questions

  *   How do we handle this in pulp_file? How are content units identified in 
pulp_file without relative_path?
 *   Checksum?
  *   How was this problem handled in Pulp 2?

Please weigh in if you have any input on potential problems with the proposal, 
potential alternate solutions, or other insights or questions!
___
Pulp-dev mailing list
Pulp-dev@redhat.com<mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev


--
Pavel Picka
Red Hat
___
Pulp-dev mailing list
Pulp-dev@redhat.com<mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] the "relative path" problem

2020-04-21 Thread Ina Panova
Daniel,

how about setting up a meeting and brainstorm the alternatives, pros/cons
there?



Regards,

Ina Panova
Senior Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley  wrote:

> Bump, this item needs to move forwards soon.  Does anyone have any
> thoughts?
>
> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka  wrote:
>
>> Hi,
>> I'd like to add one more question to this topic. Do you think it is a
>> blocker for PRs [0] & [1] as by testing [2] this features I haven't run
>> into real world example where two really same name packages appears.
>> I think this is a 'must have' feature but until we solve/decide it we can
>> have two features working may with warning in docs for users that can
>> happen in some 'special' repositories.
>>
>> To follow topic directly I like proposed move to 'RepositoryContent' and
>> add it to its uniqueness constraint (if I understand well).
>>
>> [0] https://github.com/pulp/pulp_rpm/pull/1657
>> [1] https://github.com/pulp/pulp_rpm/pull/1642
>> [2] tested with centos 7, 8, opensuse and SLE repositories
>>
>> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley  wrote:
>>
>>> We'd like to start a discussion on the "relative path problem"
>>> identified recently.
>>> Problem:
>>>
>>> Currently, a relative_path is tied to content in Pulp. This means that
>>> if a content unit exists in two places within a repository or across
>>> repositories, it has to be stored as two separate content units. This
>>> creates redundant data and potential confusion for users.
>>>
>>> As a specific example, we need to support mirroring content in pulp_rpm
>>> . Currently, for each location at
>>> which a single package is stored, we’ll need to create a content unit. We
>>> could end up with several records representing a single package. Users may
>>> be confused about why they see multiple records for a package and they may
>>> have trouble for example deciding which content unit to copy.
>>> Proposed Solution:
>>>
>>> Move “relative_path” from its current location on ContentArtifact, to
>>> RepositoryContent. This will require a sizable data migration. It is
>>> possibly the case that in rare cases, repository versions may change
>>> slightly due to deduplication.
>>>
>>> A repository-version-wide uniqueness constraint will be present on
>>> “relative_path”, independently of any other repository uniquness
>>> constraints (repo_key_fields) defined by the plugin writer.
>>>
>>> Modify the Stages API so that the relative_path can be processed in the
>>> correct location – instead of “DeclarativeArtifact” it will likely need to
>>> go on “DeclarativeContent”
>>>
>>> Remove “location_href” from the RPM Package content model – it was never
>>> a true part of the RPM (file) metadata, it is derived from the repository
>>> metadata. So storing it as a part of the Content unit doesn’t entirely make
>>> sense.
>>> Alternatives
>>>
>>> In most cases, a content unit will have a single relative path for a
>>> content unit. Creating a general solution to solve a one-off problem is
>>> usually not a good idea. As an alternative, we could look at another
>>> solution for mirroring content. One example might be to create a new object
>>> (e.g. RpmRepoMirrorContentMapping) that maps content to specific paths
>>> within a repo or repo version.
>>> Questions
>>>
>>>- How do we handle this in pulp_file? How are content units
>>>identified in pulp_file without relative_path?
>>>   - Checksum?
>>>   - How was this problem handled in Pulp 2?
>>>
>>>
>>> Please weigh in if you have any input on potential problems with the
>>> proposal, potential alternate solutions, or other insights or questions!
>>> ___
>>> Pulp-dev mailing list
>>> Pulp-dev@redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
>>
>> --
>> Pavel Picka
>> Red Hat
>>
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] the "relative path" problem

2020-04-17 Thread Daniel Alley
Bump, this item needs to move forwards soon.  Does anyone have any thoughts?

On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka  wrote:

> Hi,
> I'd like to add one more question to this topic. Do you think it is a
> blocker for PRs [0] & [1] as by testing [2] this features I haven't run
> into real world example where two really same name packages appears.
> I think this is a 'must have' feature but until we solve/decide it we can
> have two features working may with warning in docs for users that can
> happen in some 'special' repositories.
>
> To follow topic directly I like proposed move to 'RepositoryContent' and
> add it to its uniqueness constraint (if I understand well).
>
> [0] https://github.com/pulp/pulp_rpm/pull/1657
> [1] https://github.com/pulp/pulp_rpm/pull/1642
> [2] tested with centos 7, 8, opensuse and SLE repositories
>
> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley  wrote:
>
>> We'd like to start a discussion on the "relative path problem" identified
>> recently.
>> Problem:
>>
>> Currently, a relative_path is tied to content in Pulp. This means that if
>> a content unit exists in two places within a repository or across
>> repositories, it has to be stored as two separate content units. This
>> creates redundant data and potential confusion for users.
>>
>> As a specific example, we need to support mirroring content in pulp_rpm
>> . Currently, for each location at
>> which a single package is stored, we’ll need to create a content unit. We
>> could end up with several records representing a single package. Users may
>> be confused about why they see multiple records for a package and they may
>> have trouble for example deciding which content unit to copy.
>> Proposed Solution:
>>
>> Move “relative_path” from its current location on ContentArtifact, to
>> RepositoryContent. This will require a sizable data migration. It is
>> possibly the case that in rare cases, repository versions may change
>> slightly due to deduplication.
>>
>> A repository-version-wide uniqueness constraint will be present on
>> “relative_path”, independently of any other repository uniquness
>> constraints (repo_key_fields) defined by the plugin writer.
>>
>> Modify the Stages API so that the relative_path can be processed in the
>> correct location – instead of “DeclarativeArtifact” it will likely need to
>> go on “DeclarativeContent”
>>
>> Remove “location_href” from the RPM Package content model – it was never
>> a true part of the RPM (file) metadata, it is derived from the repository
>> metadata. So storing it as a part of the Content unit doesn’t entirely make
>> sense.
>> Alternatives
>>
>> In most cases, a content unit will have a single relative path for a
>> content unit. Creating a general solution to solve a one-off problem is
>> usually not a good idea. As an alternative, we could look at another
>> solution for mirroring content. One example might be to create a new object
>> (e.g. RpmRepoMirrorContentMapping) that maps content to specific paths
>> within a repo or repo version.
>> Questions
>>
>>- How do we handle this in pulp_file? How are content units
>>identified in pulp_file without relative_path?
>>   - Checksum?
>>   - How was this problem handled in Pulp 2?
>>
>>
>> Please weigh in if you have any input on potential problems with the
>> proposal, potential alternate solutions, or other insights or questions!
>> ___
>> Pulp-dev mailing list
>> Pulp-dev@redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>
>
> --
> Pavel Picka
> Red Hat
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] the "relative path" problem

2020-04-01 Thread Pavel Picka
Hi,
I'd like to add one more question to this topic. Do you think it is a
blocker for PRs [0] & [1] as by testing [2] this features I haven't run
into real world example where two really same name packages appears.
I think this is a 'must have' feature but until we solve/decide it we can
have two features working may with warning in docs for users that can
happen in some 'special' repositories.

To follow topic directly I like proposed move to 'RepositoryContent' and
add it to its uniqueness constraint (if I understand well).

[0] https://github.com/pulp/pulp_rpm/pull/1657
[1] https://github.com/pulp/pulp_rpm/pull/1642
[2] tested with centos 7, 8, opensuse and SLE repositories

On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley  wrote:

> We'd like to start a discussion on the "relative path problem" identified
> recently.
> Problem:
>
> Currently, a relative_path is tied to content in Pulp. This means that if
> a content unit exists in two places within a repository or across
> repositories, it has to be stored as two separate content units. This
> creates redundant data and potential confusion for users.
>
> As a specific example, we need to support mirroring content in pulp_rpm
> . Currently, for each location at which
> a single package is stored, we’ll need to create a content unit. We could
> end up with several records representing a single package. Users may be
> confused about why they see multiple records for a package and they may
> have trouble for example deciding which content unit to copy.
> Proposed Solution:
>
> Move “relative_path” from its current location on ContentArtifact, to
> RepositoryContent. This will require a sizable data migration. It is
> possibly the case that in rare cases, repository versions may change
> slightly due to deduplication.
>
> A repository-version-wide uniqueness constraint will be present on
> “relative_path”, independently of any other repository uniquness
> constraints (repo_key_fields) defined by the plugin writer.
>
> Modify the Stages API so that the relative_path can be processed in the
> correct location – instead of “DeclarativeArtifact” it will likely need to
> go on “DeclarativeContent”
>
> Remove “location_href” from the RPM Package content model – it was never a
> true part of the RPM (file) metadata, it is derived from the repository
> metadata. So storing it as a part of the Content unit doesn’t entirely make
> sense.
> Alternatives
>
> In most cases, a content unit will have a single relative path for a
> content unit. Creating a general solution to solve a one-off problem is
> usually not a good idea. As an alternative, we could look at another
> solution for mirroring content. One example might be to create a new object
> (e.g. RpmRepoMirrorContentMapping) that maps content to specific paths
> within a repo or repo version.
> Questions
>
>- How do we handle this in pulp_file? How are content units identified
>in pulp_file without relative_path?
>   - Checksum?
>   - How was this problem handled in Pulp 2?
>
>
> Please weigh in if you have any input on potential problems with the
> proposal, potential alternate solutions, or other insights or questions!
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>


-- 
Pavel Picka
Red Hat
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev