Re: [Pulp-dev] Disabling merge by commit

2020-09-23 Thread Justin Sherrill


On 9/23/20 7:18 AM, David Davis wrote:
I think the two main things for me are (1) it makes git history more 
linear and (2) it cuts down on the number of commits. Both of these 
make git history more readable.


The 'rebase and merge' option provides a nice balance of letting you 
provide multiple commits and maintain commit history while not creating 
a merge commit and  making a hard to read commit history.  Sometimes it 
is more expressive to have two (or three) commits that make up one pr to 
make it into the source tree.





David


On Wed, Sep 23, 2020 at 6:48 AM Ina Panova > wrote:


Hi Quirin,



Regards,

Ina Panova
Senior Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Wed, Sep 23, 2020 at 10:47 AM Quirin Pamp mailto:p...@atix.de>> wrote:

"I'd encourage plugins to consider disabling merge by commit
as well."

In order to evaluate this it would be great, if you could
explain why this was decided for pulpcore and pulp_file.
You have posted a lot of general information about the
different merge  type (the "What?"), but not so much on the
"Why?".

As far as I can tell the main advantage of squish and rebase,
is that it leads to a more tidy history in master, at the cost
of losing some information on how the sausage was made.
As a result squish and rebase becomes increasingly
advantageous with increasing PR volume.
However, I fail to see an advantage for pulp_deb, which does
not have a large PR volume.

Or am I missing some relevant part of the argument?


I think your understanding is correct. In my perspective it is
important to have a tidy history in master no matter how high/low
PR traffic you have.

pulp_container has disabled merge by commit as well.


Quirin

*From:* pulp-dev-boun...@redhat.com

mailto:pulp-dev-boun...@redhat.com>> on behalf of David Davis
mailto:davidda...@redhat.com>>
*Sent:* 22 September 2020 17:16
*To:* Pulp-dev mailto:pulp-dev@redhat.com>>
*Subject:* Re: [Pulp-dev] Disabling merge by commit
Here's some more information about PR merges as well:


https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-request-merges

David


On Tue, Sep 22, 2020 at 11:11 AM David Davis
mailto:davidda...@redhat.com>> wrote:

Today at open floor, we decided to disable merging by
commit for pulpcore and pulp_file PRs. Instead, developers
will rebase or squash PRs to merge them. This adds the
changes to HEAD instead of interspersing commits and
creating a merge commit. This picture of git history
comparing pulpcore to foreman (which doesn't merge by
commit) illustrates the differences:

https://imgur.com/a/uiIa0Mr

I'd encourage plugins to consider disabling merge by
commit as well. To do so, go to the settings page for your
github repo and look under the Merge Button section.

David

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] pulpcore 3.6.1 and 3.6.2 are Generally Available

2020-09-03 Thread Justin Sherrill
I discovered this issue with 3.6.2 (and prior) that prevents proper 
operation: https://pulp.plan.io/issues/7450


Would it be possible to get a 3.6.3 i near term?

Justin

On 9/2/20 5:13 PM, Dennis Kliban wrote:
Pulpcore 3.6.1was released yesterday, but a bug with the OpenAPI 
schema was discovered before this announcement could be sent out. This 
bug has been fixed and pulpcore 3.6.2 is now available also[0]. These 
releases contain three bug fixes and improvements to documentation. 
For a list of all changes, please check the changelog for pulpcore[1].


# Installation and Upgrade

Users should use the 3.6.2 release of pulp_installer[2] to install or 
upgrade their installations. This version of the installer will check 
compatibility of all installed plugins with pulpcore 3.6. The 
installer will abort if any plugin is incompatible.


The pulp_installer collection can be installed from Ansible Galaxy 
with the following command:


    ansible-galaxy collection  install --force pulp.pulp_installer

The --force flag will upgrade the collection if you had a previous 
version installed.


# Plugin API

The plugin API changes were all related to problems with the OpenAPI 
schema. These bugs were all introduced in 3.6.0 when pulpcore upgraded 
to OpenAPI version 3. The OpenAPI schema improvements are also evident 
in the both the Python[3] and Ruby[4] clients. For the full list of 
changes, please check the plugin API changelog[5].


[0] https://pypi.org/project/pulpcore/3.6.2/
[1] https://docs.pulpproject.org/pulpcore/en/3.6.2/changes.html#id1
[2] https://galaxy.ansible.com/pulp/pulp_installer
[3] https://pypi.org/project/pulpcore-client/3.6.2/
[4] https://rubygems.org/gems/pulpcore_client/versions/3.6.2
[5] https://docs.pulpproject.org/en/3.6.2/changes.html#plugin-api

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp3 Concurrency testing

2020-07-28 Thread Justin Sherrill


On 7/28/20 11:44 AM, David Davis wrote:
Today we discussed this at triage. We're leaning towards changing the 
default from 20 to 10 as it seems like 10 only incurs an extra 30% 
penalty in time while seeming to fix the problem[0].


One question though is how we should treat existing data because most 
Remotes at this point probably have a value of 20 for 
download_concurrency. We came up with two options that we would like 
some feedback on.


It seems kinda strange that the 'default' value is recorded in the db on 
the remote at creation time.  Any reason to just leave it 'nil' and use 
the 'app default' ?   This doesn't really help with past values, but 
seems like a better model going forward.





# Option 1: Migrate 20 to 10

This would be a migration in pulpcore that would update 
download_concurrency to 10 for all Remotes whose download_concurrency 
is set to 10. Something like:


  
Remote.objects.all().filter(download_concurrency=20).update(download_concurrency=10)


My vote is for #1 because:

a) I imagine ~99% of people want the recommended default, so 99% of 
people will either need to update this manually (if doing #2) or face 
sync failures


b) i'd rather 99% of people not have to do anything than the 1% that for 
some reason actually wanted 20 and weren't just going with the 
'recommended default' at that time.  The number of people in this case 
could very well be zero.





# Option 2: Documentation

This would be similar to the migration approach but instead of 
modifying our users' data, we'd document how they could do it 
themselves. So something like:


    pulpcore-manager shell_plus -c 
"Remote.objects.all().filter(download_concurrency=20).update(download_concurrency=10)



Any feedback is welcome.

[0] https://pulp.plan.io/issues/7186#note-2

David


On Mon, Jul 27, 2020 at 2:57 PM Grant Gainey > wrote:


Hey folks,

Looking into issue 7212  , over
the weekend I did some ad-hoc evaluations of sync-performance at
various concurrency settings. I wrote up my observations here:

https://hackmd.io/@ggainey/pulp3_sync_concurrency

Just thought folk might be interested.

G
-- 
Grant Gainey

Principal Software Engineer, Red Hat System Management Engineering


___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] pulpcore 3.4.0 release scheduled for May 27th

2020-05-19 Thread Justin Sherrill
Is it normal for things added/fixed between 3.3 and 3.4 to not show up 
in that version tracker?


The item i'm thinking about is https://pulp.plan.io/issues/6591

On 5/19/20 8:55 PM, Dennis Kliban wrote:
Pulpcore 3.4.0 is scheduled to be released on May 27th. There are 
currently 3 issues that have been proposed as blockers for this 
release[0]. Please respond to this thread with any other issues that 
should potentially be addressed in this release.


[0] https://pulp.plan.io/versions/88

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] No-Retry Behaviour on Network Errors: call for feedback and concerns

2020-05-12 Thread Justin Sherrill

I apparently replied instead of replied-list!  Resending:

I don't have known concerns (more worry about what happens as more and 
more people use pulp3 in the real world), but i do think after reading 
the investigation in https://pulp.plan.io/issues/6589 that the resulting 
RFE: https://pulp.plan.io/issues/6699 is hugely important.


Justin

On 5/12/20 4:41 PM, Brian Bouterse wrote:
tl;dr: pulp does not retry when there are network errors or the server 
hangs up. We are going to document this as part of this issue 
https://pulp.plan.io/issues/6624 Please share concerns or feedback 
with this plan before open floor on May 15th.


# Background

At open floor today we touched on how Pulp 3 downloading does not have 
retry logic in these cases:

* the server hanging up the TCP connection
* network errors which cause TCP hangups
* http errors other than [429, 502, 503, 504]

# The Documentation Plan

The current plan is to document this onto docs.pulpproject.org 
 as part of this ticket 
https://pulp.plan.io/issues/6624. It will document:

* the no-retry behavior cases
* with the reasons why Pulp does not retry
* that users can retry and Pulp will effectively resume due to not 
having to redownload content it already downloaded


# Feedback

Do you have concerns or other feedback with this plan? Is this the 
best thing Pulp can do? If you are interested in sharing, please do 
before open floor on Friday May 15th.


Thanks!
Brian



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] redmine process for katello-integration-related issues

2020-04-08 Thread Justin Sherrill

+1 to all of this!

On 4/8/20 12:35 PM, Brian Bouterse wrote:
Thanks for writing this up and sending! My only addition would be to 
also remove the P1, P2, P3 tags entirely after setting all tagged 
issues with 'katello' and setting their priorities based on the 
previous P1/P2/P3 label.


Thank you!

On Wed, Apr 8, 2020 at 12:32 PM Grant Gainey > wrote:


Hey folks,

As part of working with the katello upstream, we have been using a
mechanism for prioritizing pulp-issues in order to help keep the
Katello Gang unblocked. We have been using the 'Tags' field in an
issue, and marking things as Katello-P1/2/3, with P1 being
"blocker for the next release".

As we move through releases, this is starting to break down - last
release's P2 is this release's P1. This was brought up for
discussion in today's integration meeting.

In order to continue being able to prioritize work, we're
proposing a change to the process to make it more sustainable as
releases go on. I *think* I have captured the proposal effectively
below - if I've missed something vital, I'm sure someone who was
in the meeting will expand on it:

  * tag katello-related issues as 'katello'
  * use the milestone field to define the planned-pulp-release-version
  * use the Priority field to mark how important it is, *to
katello*, to fix a bug NOW, as opposed to 'the day before the
release is cut' (which in practice is likely to be  'blockers
are critical, everything else is normal')

This will make it easy to query redmine in a way that returns a
properly-ordered list, without some human having to go through and
group-change tags on multiple issues at once.

Would appreciate more eyes on this, and especially input on what I
might have missed. We'd like to switch 'soon', so feedback before,
say next Wednesday 15-APR would be great!

Thanks,
G
-- 
Grant Gainey

Principal Software Engineer, Red Hat System Management Engineering
___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] SUSE repositories in Pulp

2020-03-24 Thread Justin Sherrill
I much prefer this solution (A single RPM Repository type), and i think 
just using 'location_href' for a rpm uniquness within a repo version 
makes a lot of sense, overall +1.


Justin

On 3/23/20 4:27 PM, Daniel Alley wrote:
I think, as long as the metadata is correct, using just the 
location_href would be OK.  It should contain all the other bits of 
information.


On Mon, Mar 23, 2020 at 3:57 PM David Davis > wrote:


A couple questions below.

On Mon, Mar 23, 2020 at 3:47 PM Tatiana Tereshchenko
mailto:ttere...@redhat.com>> wrote:

Clarification:
The proposal is to add  the 'location_href' attribute to
the  repo_key_fields, uniqueness constraint within a
repository version, so 2 packages with the same NEVRA but
different location can be present in one repo.


Why have nevra+relative_path instead of just relative_path? ie
would it be possible for two packages in a repo version to have
the same relative_paths but different nevras?

RPM package is still uniquely identified in Pulp by NEVRA + 
checksum(aka pkgId) + checksum type.


What if a user has the same package in a repo at two different
locations or the same package in two different repos at the
different locations. Since relative_path is attached to the
content unit, I think this would prevent this from happening? I
wonder if uniqueness in Pulp should also have
location_href/relative_path?


On Mon, Mar 23, 2020 at 7:33 PM Grant Gainey
mailto:ggai...@redhat.com>> wrote:

On Mon, Mar 23, 2020 at 2:01 PM Dennis Kliban
mailto:dkli...@redhat.com>> wrote:

During last week's RPM team meeting, a concern was
raised about using the same repository type for both
Red Hat and SUSE repositories. Since that meeting I
have only been able to identify a single difference
between the two repositories. SUSE repos can contain
the same package in two different locations in the
same repository. Even though I just referred to this
as a difference, I don't actually believe that to be
true. All RPM repositories should be able to support
this.


If I'm reading the discussion w/the RPM folks correctly,
this is 'odd but legal' for rpm-repositories. That means
that, while SUSE may be the only current example, there's
nothing to keep some other distro/thirdparty from doing
the exact same thing, and we'd have to handle it.

I propose that we not add a separate repository type
for SUSE and simply add the 'location' attribute of an
RPM to it's uniqueness constraint.  What do you all
think?


Yeah, concur. It feels messy - but only because the
problem-domain itself is messy :(

G

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev



-- 
Grant Gainey

Principal Software Engineer, Red Hat System Management
Engineering
___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Moving Content Guard Authorization to Webserver and out of pulp-content

2020-03-11 Thread Justin Sherrill
Thanks and it does appears that we use certs that could be too large for 
default header sizes (by several multiples).


Could you elaborate a bit about the design a bit more?  I'm curious what 
the requirements of web service layer (will it need to talk to the pulp3 
api? the db?)  Will it just add some header after reading the cert (and 
validating the path) and then pass it on to the reverse proxy with apache?


Thanks!

Justin

On 3/11/20 2:58 PM, Brian Bouterse wrote:



On Wed, Mar 11, 2020 at 2:34 PM Justin Sherrill <mailto:jsher...@redhat.com>> wrote:


We had discussed base64 encoding the cert in the webserver on the
way in and then letting cert guard decode it.  While that's not
ideal I think it has some advantages over moving the full auth
into the webserver.  What was your motivation for going with that
approach over the base64 encoding approach?

Thank you for this question! I ended up with a few different concerns 
about the base64 encode-and-forward idea. Architecturally the concern 
with it is that it's frowned upon to forward the client's TLS cert 
beyond the TLS termination point because that is what MITM software 
does. Also, there are some practical concerns: one, I don't think 
nginx can provide a similar runtime base64 encoding feature. Also I 
was concerned with header length truncation and what happens when the 
certificates get longer.


Overall having auth that is based on TLS certificates brought me to 
the conclusion that we need to auth where the TLS is terminated. What 
do you think?


More thoughts and questions are welcome. This is a good discussion.

On 3/11/20 2:11 PM, Brian Bouterse wrote:

tl;dr: What we have today cannot work with rhsm certificates
which Katello uses. To resolve, we need to have content guard
checking moved to the webserver configs for apache and nginx and
not done in pulp-content as it is today.
https://pulp.plan.io/issues/6323

We need to bring the auth to where TLS is terminated because we
can't being the client certs to pulp-content due to invalid
header characters. As is, pulp-certguard cannot work with
Katello's cert types (rhsm certs) so that is driving my changes.

If anyone has major concerns or other ideas please let me know.
In the meantime I'm proceeding moving the authorization to the
webserver and then updating pulp-certguard to work with that.
This will make pulp-certguard's GA tied to pulpcore 3.3.0.
Feedback is welcome.

[0]: https://pulp.plan.io/issues/6323

Thanks,
Brian


___
Pulp-dev mailing list
Pulp-dev@redhat.com  <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Moving Content Guard Authorization to Webserver and out of pulp-content

2020-03-11 Thread Justin Sherrill
We had discussed base64 encoding the cert in the webserver on the way in 
and then letting cert guard decode it.  While that's not ideal I think 
it has some advantages over moving the full auth into the webserver.  
What was your motivation for going with that approach over the base64 
encoding approach?


On 3/11/20 2:11 PM, Brian Bouterse wrote:
tl;dr: What we have today cannot work with rhsm certificates which 
Katello uses. To resolve, we need to have content guard checking moved 
to the webserver configs for apache and nginx and not done in 
pulp-content as it is today. https://pulp.plan.io/issues/6323


We need to bring the auth to where TLS is terminated because we can't 
being the client certs to pulp-content due to invalid header 
characters. As is, pulp-certguard cannot work with Katello's cert 
types (rhsm certs) so that is driving my changes.


If anyone has major concerns or other ideas please let me know. In the 
meantime I'm proceeding moving the authorization to the webserver and 
then updating pulp-certguard to work with that. This will make 
pulp-certguard's GA tied to pulpcore 3.3.0. Feedback is welcome.


[0]: https://pulp.plan.io/issues/6323

Thanks,
Brian


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Importers/Exporters

2020-02-20 Thread Justin Sherrill

There are two different forms of export today in katello:

Legacy version:

  * Uses pulp2's export functionality

  * Takes the tarball as is

"New" Version

   * Just copies published repository as is (following symlinks)

   * Adds own 'katello' metadata to existing tarball


I would imagine that with pulp3 we would somewhat combine these two 
approaches and take the pulp3 generated export file and add in a 
metadata file of some sort.


Justin

On 2/19/20 2:28 PM, Dennis Kliban wrote:

Thank you for the details. More questions inline.

On Wed, Feb 19, 2020 at 2:04 PM Justin Sherrill <mailto:jsher...@redhat.com>> wrote:


the goal from our side is to have a very similar experience to the
user.  Today the user would:

* run a command (for example, something similar to hammer
content-view version export --content-view-name=foobar --version=1.0)

* this creates a tarball on disk

What all is in the tarball? Is this just a repository export created 
by Pulp or is there extra information from the Katello db?


* they copy the tarball to external media

* they move the external media to the disconnected katello

* they run 'hammer content-view version import
--export-tar=/path/to/tarball

Does katello untar this archive, create a repository in pulp, sync 
from the directory containing the unarchive, and then publish?


I don't see this changing much for the user, anything additional
that needs to be done in pulp can be done behind the cli/api in
katello.  Thanks!


Justin

On 2/19/20 12:52 PM, Dennis Kliban wrote:

In Katello that uses Pulp 2, what steps does the user need to
take when importing an export into an air gapped environment? I
am concerned about making the process more complicated than what
the user is already used to.

On Wed, Feb 19, 2020 at 11:20 AM David Davis
mailto:davidda...@redhat.com>> wrote:

Thanks for the responses so far. I think we could export
publications along with the repo version by exporting any
publication that points to a repo version.

My concern with exporting repositories is that users will
probably get a bunch of content they don't care about if they
want to export a single repo version. That said, if users do
want to export entire repos, we could add this feature later
I think?

    David


    On Wed, Feb 19, 2020 at 10:30 AM Justin Sherrill
mailto:jsher...@redhat.com>> wrote:


On 2/14/20 1:09 PM, David Davis wrote:

Grant and I met today to discuss importers and
exporters[0] and we'd like some feedback before we
proceed with the design. To sum up this feature briefly:
users can export a repository version from one Pulp
instance and import it to another.

# Master/Detail vs Core

So one fundamental question is whether we should use a
Master/Detail approach or just have core control the
flow but call out to plugins to get export formats.

To give some background: we currently define Exporters
(ie FileSystemExporter) in core as Master models.
Plugins extend this model which allows them to configure
or customize the Exporter. This was necessary because
some plugins need to export Publications (along with
repository metadata) while other plugins who don't have
Publications or metadata export RepositoryVersions.

The other option is to have core handle the workflow.
The user would call a core endpoint and provide a
RepositoryVersion. This would work because for
importing/exporting, you wouldn't ever use Publications
because metadata won't be used for importing back into
Pulp. If needed, core could provide a way for plugin
writers to write custom handlers/exporters for content
types.

If we go with the second option, the question then
becomes whether we should divorce the concept of
Exporters and import/export. Or do we also switch
Exporters from Master/Detail to core only?

# Foreign Keys

Content can be distributed across multiple tables (eg
UpdateRecord has UpdateCollection, etc). In our export,
we could either use primary keys (UUIDs) or natural keys
to relate records. The former assumes that UUIDs are
unique across Pulp instances. The safer but more complex
alternative is to use natural keys. This would involve
storing a set of fields on a record that would be used
to identify a related record.

# Incremental Exports

There are two big pieces of data con

Re: [Pulp-dev] Importers/Exporters

2020-02-19 Thread Justin Sherrill
the goal from our side is to have a very similar experience to the 
user.  Today the user would:


* run a command (for example, something similar to hammer content-view 
version export --content-view-name=foobar --version=1.0)


* this creates a tarball on disk

* they copy the tarball to external media

* they move the external media to the disconnected katello

* they run 'hammer content-view version import --export-tar=/path/to/tarball

I don't see this changing much for the user, anything additional that 
needs to be done in pulp can be done behind the cli/api in katello.  Thanks!


Justin

On 2/19/20 12:52 PM, Dennis Kliban wrote:
In Katello that uses Pulp 2, what steps does the user need to take 
when importing an export into an air gapped environment? I am 
concerned about making the process more complicated than what the user 
is already used to.


On Wed, Feb 19, 2020 at 11:20 AM David Davis <mailto:davidda...@redhat.com>> wrote:


Thanks for the responses so far. I think we could export
publications along with the repo version by exporting any
publication that points to a repo version.

My concern with exporting repositories is that users will probably
get a bunch of content they don't care about if they want to
export a single repo version. That said, if users do want to
export entire repos, we could add this feature later I think?

David


On Wed, Feb 19, 2020 at 10:30 AM Justin Sherrill
mailto:jsher...@redhat.com>> wrote:


On 2/14/20 1:09 PM, David Davis wrote:

Grant and I met today to discuss importers and exporters[0]
and we'd like some feedback before we proceed with the
design. To sum up this feature briefly: users can export a
repository version from one Pulp instance and import it to
another.

# Master/Detail vs Core

So one fundamental question is whether we should use a
Master/Detail approach or just have core control the flow but
call out to plugins to get export formats.

To give some background: we currently define Exporters (ie
FileSystemExporter) in core as Master models. Plugins extend
this model which allows them to configure or customize the
Exporter. This was necessary because some plugins need to
export Publications (along with repository metadata) while
other plugins who don't have Publications or metadata export
RepositoryVersions.

The other option is to have core handle the workflow. The
user would call a core endpoint and provide a
RepositoryVersion. This would work because for
importing/exporting, you wouldn't ever use Publications
because metadata won't be used for importing back into Pulp.
If needed, core could provide a way for plugin writers to
write custom handlers/exporters for content types.

If we go with the second option, the question then becomes
whether we should divorce the concept of Exporters and
import/export. Or do we also switch Exporters from
Master/Detail to core only?

# Foreign Keys

Content can be distributed across multiple tables (eg
UpdateRecord has UpdateCollection, etc). In our export, we
could either use primary keys (UUIDs) or natural keys to
relate records. The former assumes that UUIDs are unique
across Pulp instances. The safer but more complex alternative
is to use natural keys. This would involve storing a set of
fields on a record that would be used to identify a related
record.

# Incremental Exports

There are two big pieces of data contained in an export: the
dataset of Content from the database and the artifact files.
An incremental export cuts down on the size of an export by
only exporting the differences. However, when performing an
incremental export, we could still export the complete
dataset instead of just a set of differences
(additions/removals/updates). This approach would be simpler
and it would allow us to ensure that the new repo version
matches the exported repo version exactly. It would however
increase the export size but not by much I think--probably
some number of megabytes at most.


If its simper, i would go with that.  Saving even ~100-200 MB
isn't that big of a deal IMO.  the biggest savings is in the
RPM content.




[0] https://pulp.plan.io/issues/6134

David

___
Pulp-dev mailing list
Pulp-dev@redhat.com  <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev

___
Pulp-dev mailing list
Pulp-dev@redhat.com <ma

Re: [Pulp-dev] Importers/Exporters

2020-02-19 Thread Justin Sherrill


On 2/14/20 1:09 PM, David Davis wrote:
Grant and I met today to discuss importers and exporters[0] and we'd 
like some feedback before we proceed with the design. To sum up this 
feature briefly: users can export a repository version from one Pulp 
instance and import it to another.


# Master/Detail vs Core

So one fundamental question is whether we should use a Master/Detail 
approach or just have core control the flow but call out to plugins to 
get export formats.


To give some background: we currently define Exporters (ie 
FileSystemExporter) in core as Master models. Plugins extend this 
model which allows them to configure or customize the Exporter. This 
was necessary because some plugins need to export Publications (along 
with repository metadata) while other plugins who don't have 
Publications or metadata export RepositoryVersions.


The other option is to have core handle the workflow. The user would 
call a core endpoint and provide a RepositoryVersion. This would work 
because for importing/exporting, you wouldn't ever use Publications 
because metadata won't be used for importing back into Pulp. If 
needed, core could provide a way for plugin writers to write custom 
handlers/exporters for content types.


If we go with the second option, the question then becomes whether we 
should divorce the concept of Exporters and import/export. Or do we 
also switch Exporters from Master/Detail to core only?


# Foreign Keys

Content can be distributed across multiple tables (eg UpdateRecord has 
UpdateCollection, etc). In our export, we could either use primary 
keys (UUIDs) or natural keys to relate records. The former assumes 
that UUIDs are unique across Pulp instances. The safer but more 
complex alternative is to use natural keys. This would involve storing 
a set of fields on a record that would be used to identify a related 
record.


# Incremental Exports

There are two big pieces of data contained in an export: the dataset 
of Content from the database and the artifact files. An incremental 
export cuts down on the size of an export by only exporting the 
differences. However, when performing an incremental export, we could 
still export the complete dataset instead of just a set of differences 
(additions/removals/updates). This approach would be simpler and it 
would allow us to ensure that the new repo version matches the 
exported repo version exactly. It would however increase the export 
size but not by much I think--probably some number of megabytes at most.


If its simper, i would go with that.  Saving even ~100-200 MB isn't that 
big of a deal IMO.  the biggest savings is in the RPM content.





[0] https://pulp.plan.io/issues/6134

David

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Importers/Exporters

2020-02-19 Thread Justin Sherrill
Distributions definitely do not need to be migrated (i see them more as 
configuration similar to importers, than i do as something that can be 
exported/imported).


Publications i think would be a nice to have, but can be re-generated on 
the other side based on the repository version. While the publication 
won't be 100% exact (due to timestamps and ordering differences), i'm 
not sure that it matters.  The biggest drawback will just be that the 
importing side needs to be smarter, and it will take a little longer.    
I could see import/export of publications being an optimization or 
extension later on, but not necessary for a first pass.  If i'm missing 
some issue or use case, please do speak up!


Justin

On 2/17/20 1:05 PM, David Davis wrote:
The user stories that Katello gave us (which I've entered into redmine 
here[0]) don't mention publications or distributions. I will follow up 
with Katello though.


[0] https://pulp.plan.io/issues/6134

David


On Mon, Feb 17, 2020 at 12:49 PM Dennis Kliban > wrote:


On Fri, Feb 14, 2020 at 1:11 PM David Davis mailto:davidda...@redhat.com>> wrote:

Grant and I met today to discuss importers and exporters[0]
and we'd like some feedback before we proceed with the design.
To sum up this feature briefly: users can export a repository
version from one Pulp instance and import it to another.


Exporting just repository versions is not sufficient for
reproducing a Pulp instance in an air gapped environment. Users
need to be able to use the "export" to populate a Pulp instance
without needing to create any publications and/or distributions
afterwards. What about letting users specify a repository to
export and then have pulpcore figure out which repository
versions, publications, distributions, content, metadata, and
artifacts need to be exported?


# Master/Detail vs Core

So one fundamental question is whether we should use a
Master/Detail approach or just have core control the flow but
call out to plugins to get export formats.

To give some background: we currently define Exporters (ie
FileSystemExporter) in core as Master models. Plugins extend
this model which allows them to configure or customize the
Exporter. This was necessary because some plugins need to
export Publications (along with repository metadata) while
other plugins who don't have Publications or metadata export
RepositoryVersions.

The other option is to have core handle the workflow. The user
would call a core endpoint and provide a RepositoryVersion.
This would work because for importing/exporting, you wouldn't
ever use Publications because metadata won't be used for
importing back into Pulp. If needed, core could provide a way
for plugin writers to write custom handlers/exporters for
content types.

If we go with the second option, the question then becomes
whether we should divorce the concept of Exporters and
import/export. Or do we also switch Exporters from
Master/Detail to core only?

# Foreign Keys

Content can be distributed across multiple tables (eg
UpdateRecord has UpdateCollection, etc). In our export, we
could either use primary keys (UUIDs) or natural keys to
relate records. The former assumes that UUIDs are unique
across Pulp instances. The safer but more complex alternative
is to use natural keys. This would involve storing a set of
fields on a record that would be used to identify a related
record.

# Incremental Exports

There are two big pieces of data contained in an export: the
dataset of Content from the database and the artifact files.
An incremental export cuts down on the size of an export by
only exporting the differences. However, when performing an
incremental export, we could still export the complete dataset
instead of just a set of differences
(additions/removals/updates). This approach would be simpler
and it would allow us to ensure that the new repo version
matches the exported repo version exactly. It would however
increase the export size but not by much I think--probably
some number of megabytes at most.

[0] https://pulp.plan.io/issues/6134

David
___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinf

Re: [Pulp-dev] RPM stories, feedback is needed

2020-01-27 Thread Justin Sherrill

Both of these look good to me

Justin

On 1/27/20 2:44 PM, Tatiana Tereshchenko wrote:

I'm looking for feedback on the stories below.  For both:
 - is the logic correct or if anything is missing?
 - where to store info (from a remote) which is needed later?

1. https://pulp.plan.io/issues/4458 As a user, I can configure which 
checksum algorithm to use when creating metadata
 - the suggestion is to store a checksum type of a remote on a rpm 
repo version model.


2. https://pulp.plan.io/issues/6055 As a user, my sync is not 
operational if there are no changes
  - the suggestion is to store all the necessary details on a 
rpm repository model.


Please comment on the issues (preferred) or discuss here.

Thank you,
Tanya

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp3 Applicability Design thoughts (and Katello)

2020-01-22 Thread Justin Sherrill
To wrap this discussion up, it seems that there is pretty wide agreement 
that this functionality best fits into Katello.


We'll likely start an RFC discussion over at the foreman Discourse 
<https://community.theforeman.org/> server soon: 
https://community.theforeman.org/.


Thanks All!

Justin

On 1/17/20 8:30 AM, Justin Sherrill wrote:
There have been some design discussions going on around applicability 
(https://hackmd.io/ydvHuzXNRA6T9eXx6cqy5A) in pulp3.


There are some big changes compared to pulp2, including:

* Package profile, module profile,and repository list have to be 
uploaded on every applicability computation


* Calls for applicability are not asynchronous (which makes sense as 
they are one profile at a time).


Also keep in mind that due to the complexity of all the information 
needed, Katello has been the primary (and sometimes the only?) user of 
the service.


For an example of what this might looks like, consider a repository 
that syncs some new packages.  For 10,000 clients, it has to send the 
full package profiles for all 10,000 clients, as well as the other 
information in 10,000 api calls.  In Addition our task workers will 
have to wait around for all 10,000 clients to be calculated.  One last 
point is that katello already knows all the NVREA's for the rpms, 
which rpms are in which repositories, which artifacts are in which 
modules, and which packages are in what errata.


Given all this, does it make sense for pulp to calculate the 
applicability?  Or does it make sense for katello to?


This assumes that no one else wants to use applicability in pulp3 (and 
given the barrier to entry is even higher than it was in pulp2, that 
may be possible).


Justin

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Pulp3 Applicability Design thoughts (and Katello)

2020-01-17 Thread Justin Sherrill
There have been some design discussions going on around applicability 
(https://hackmd.io/ydvHuzXNRA6T9eXx6cqy5A) in pulp3.


There are some big changes compared to pulp2, including:

* Package profile, module profile,and repository list have to be 
uploaded on every applicability computation


* Calls for applicability are not asynchronous (which makes sense as 
they are one profile at a time).


Also keep in mind that due to the complexity of all the information 
needed, Katello has been the primary (and sometimes the only?) user of 
the service.


For an example of what this might looks like, consider a repository that 
syncs some new packages.  For 10,000 clients, it has to send the full 
package profiles for all 10,000 clients, as well as the other 
information in 10,000 api calls.  In Addition our task workers will have 
to wait around for all 10,000 clients to be calculated.  One last point 
is that katello already knows all the NVREA's for the rpms, which rpms 
are in which repositories, which artifacts are in which modules, and 
which packages are in what errata.


Given all this, does it make sense for pulp to calculate the 
applicability?  Or does it make sense for katello to?


This assumes that no one else wants to use applicability in pulp3 (and 
given the barrier to entry is even higher than it was in pulp2, that may 
be possible).


Justin


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Katello and Pulp 3.1

2019-11-04 Thread Justin Sherrill
A while back we had some discussion about pulp 3.1 (and somewhat beyond 
pulp 3.1), and i wanted to bring attention to some items that may need 
to be looked at in those time frames:


https://pulp.plan.io/issues/5613  (re-download corrupted files)
https://pulp.plan.io/issues/5200 (supporting mirrored metadata)
https://pulp.plan.io/issues/5199 (alternate content sources) - this one 
can probably be pushed out somewhat, and may need more of a feature 
design in both katello and pulp


Justin

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pagination Requirements and Defaults?

2019-08-21 Thread Justin Sherrill


On 8/20/19 12:17 PM, Brian Bouterse wrote:
Recently with pulp_ansible, users were interested in using pagination 
with LimitOffsetPagination [0]. Pulp currently defaults to 
PageNumberPagination. I looked at our current DRF defaults, and I 
noticed two things.


1. We default to the not-as-common PageNumberPagination based on 
examples in the drf docs.

2. We customize it here [1] in various ways.

Can someone help me remember why these pagination style choices were 
made or where the requirements came from?

Would our bindings work with a LimitOffsetPagination style?
What use cases drove the use and customization in this area?

Also, @katello how would a pagination style change (like switching to 
LimitOffsetPagination) affect you?


Speaking for katello, we'd have to change a few lines of code, but it 
seems like it would be minimal.




Thanks for any info you can provide. Maybe what we have right now is 
just what we need, but I'm not sure.


-Brian

[0]: 
https://www.django-rest-framework.org/api-guide/pagination/#setting-the-pagination-style
[1]: 
https://github.com/pulp/pulpcore/blob/master/pulpcore/app/pagination.py


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] pulp 3 bindings change proposal

2019-06-20 Thread Justin Sherrill


On 6/20/19 8:02 AM, Dennis Kliban wrote:
On Wed, Jun 19, 2019 at 1:57 PM Dennis Kliban <mailto:dkli...@redhat.com>> wrote:


On Wed, Jun 19, 2019 at 11:34 AM Dennis Kliban mailto:dkli...@redhat.com>> wrote:

On Wed, Jun 19, 2019 at 11:20 AM Justin Sherrill
mailto:jsher...@redhat.com>> wrote:

If a plugin provided multiple remotes, for example, what
would that look like?

in your example:

|-file_remote =
fileremotes.remotes_file_file_create(remote_data)
+file_remote = fileremotes.create(remote_data) Lets say
the file plugin provided some other remote that still
synced file content? |


The goal is to provide separate API objects for each remote or
content type that a plugin provides. So the code would look
like this:

file_remote = fileremote.create(remote_data)
file_fancy_remote = filefancyremote.create(fancy_remote_data)

My current implementation does not support this, but I am
working toward the above solution.


I was able to achieve this. I posted some screen shots of what the
docs look like here[0].

Docker has multiple content types. So docker bindings would
provide the following objects: ContentDockerBlobsApi,
ContentDockerManifestListTagsApi, ContentDockerManifestListsApi,
ContentDockerManifestTagsApi, and ContentDockerManifestsApi.



I updated my patch and removed the plugin name from the Api object 
names. So the above objects are now ContentBlobsApi, 
ContentManifestListTagsApi, ContentManifestListsApi, 
ContentManifestTagsApi, and ContentManifestsApi.


I like this all, and agree it improves readability.  I assume there's no 
concern about plugins implementing some model with the same name?  Or i 
guess this could already be a problem when it comes to model/db table 
names in the app itself?


Justin



I have 2 PRs for this change[0,1]. The use of the bindings can be seen 
in both of the PR. I'd like to get this work merged today.


[0] https://github.com/pulp/pulpcore/pull/178
[1] https://github.com/pulp/pulp-openapi-generator/pull/18


Each of those objects would have a create(), read(), delete(),
list() methods.

Do others agree that this improves the usability of the bindings?


[0] https://imgur.com/a/Ag7gqmj

|Justin |

On 6/19/19 9:45 AM, Dennis Kliban wrote:


I didn't get a note in my email, but I did see one on in
the list archive[0]. So here is my response to it:

I agree that we could use modified templates to achieve
the same results. However, that means that we will need
to modify templates for every language we want to
generate bindings in. In both cases the generated client
code will be exactly the same. From a maintenance
perspective, it is easier to add a feature to Pulp's REST
API that produces a modified version of the OpenAPI
schema. It also means that we can always use the latest
versions of the templates shipped with openapi-generator.

The documentation site would continue to distribute an
OpenAPI schema where each Operation Id is unique.

Pulp's OpenAPI schema does not currently pass validation
because the paths are not unique. In order to use the
'href' of each resource as the primary identifier, it was
necessary to template paths as {artifact_href},
{repository_href}, {file_content_href}, etc. This schema
cannot be used to generate server code. However, it works
well when generating client code. The non-unique
operation ids would be a problem for generating a server
also. However, they don't produce problems when
generating client code.

Does this address your concerns?

[0]
https://www.redhat.com/archives/pulp-dev/2019-June/msg00061.html

On Wed, Jun 19, 2019 at 8:54 AM Dennis Kliban
mailto:dkli...@redhat.com>> wrote:

As pointed out in a recent issue[0], the method names
in the bindings generated from Pulp's OpenAPI schema
are unnecessarily verbose. Each method name
corresponds to an Operation Id in the OpenAPI schema.
The Operation Id is also used as an HTML anchor in
the REST API docs[1].

It is possible to generate a schema where each
Operation Id is shorter, but then the Operation Ids
are not unique and all the linking in the REST API
documentation breaks. We can avoid this problem by
keeping the long Operation Id for the schema
genera

Re: [Pulp-dev] pulp 3 bindings change proposal

2019-06-19 Thread Justin Sherrill
If a plugin provided multiple remotes, for example, what would that look 
like?


in your example:

|-file_remote = fileremotes.remotes_file_file_create(remote_data) 
+file_remote = fileremotes.create(remote_data) Lets say the file plugin 
provided some other remote that still synced file content? Justin |


On 6/19/19 9:45 AM, Dennis Kliban wrote:

I didn't get a note in my email, but I did see one on in the list 
archive[0]. So here is my response to it:


I agree that we could use modified templates to achieve the same 
results. However, that means that we will need to modify templates for 
every language we want to generate bindings in. In both cases the 
generated client code will be exactly the same. From a maintenance 
perspective, it is easier to add a feature to Pulp's REST API that 
produces a modified version of the OpenAPI schema. It also means that 
we can always use the latest versions of the templates shipped with 
openapi-generator.


The documentation site would continue to distribute an OpenAPI schema 
where each Operation Id is unique.


Pulp's OpenAPI schema does not currently pass validation because the 
paths are not unique. In order to use the 'href' of each resource as 
the primary identifier, it was necessary to template paths as 
{artifact_href}, {repository_href}, {file_content_href}, etc. This 
schema cannot be used to generate server code. However, it works well 
when generating client code. The non-unique operation ids would be a 
problem for generating a server also. However, they don't produce 
problems when generating client code.


Does this address your concerns?

[0] https://www.redhat.com/archives/pulp-dev/2019-June/msg00061.html

On Wed, Jun 19, 2019 at 8:54 AM Dennis Kliban > wrote:


As pointed out in a recent issue[0], the method names in the
bindings generated from Pulp's OpenAPI schema are unnecessarily
verbose. Each method name corresponds to an Operation Id in the
OpenAPI schema. The Operation Id is also used as an HTML anchor in
the REST API docs[1].

It is possible to generate a schema where each Operation Id is
shorter, but then the Operation Ids are not unique and all the
linking in the REST API documentation breaks. We can avoid this
problem by keeping the long Operation Id for the schema generated
for the docs and only using short Operation Ids when generating
the schema for the bindings.

The difference in usage of the bindings can be seen here[2].

Is there any objection to including such a change in time for RC 3?

[0] https://pulp.plan.io/issues/4989
[1] https://docs.pulpproject.org/en/3.0/nightly/restapi.html
[2] https://pulp.plan.io/issues/4989#note-1


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] ergonomics of providing Pulp with lists of items

2019-05-06 Thread Justin Sherrill
To me the API is the interface to pulp, not httpie and I do not think 
you should corrupt the api to make it easier to use from httpie (I.e. 
switch to using comma separated values when json provides a method for 
specifying multiple values).


If you want to support both on the server I think that would be fine, 
but I think if usability is a concern, a more usable CLI is needed!


my two cents :)

Justin

On 5/3/19 8:48 PM, Daniel Alley wrote:
Providing Pulp with lists of values from the command line is rather 
unweildy.  There's a lot of unnecessary escaping going on.


|http POST :24817${REPO_HREF}versions/ 
add_content_units:="[\"$CONTENT_HREF\",\"$CONTENT_2_HREF\"]"|

|
|
|http POST http://localhost:24817/pulp/api/v3/rpm/copy/ 
source_repo=${SRC_REPO_HREF} dest_repo=${DEST_REPO_HREF} 
types:="[\"errata\"]"|


Tanya, Ina and myself thought it would be worth discussing the idea of 
using something more ergonomic, like a comma-separated string.  This 
would make the endpoints much easier to use manually.


|http POST :24817${REPO_HREF}versions/ 
add_content_units="$CONTENT_HREF,$CONTENT_2_HREF"|

|
|
||http POST http://localhost:24817/pulp/api/v3/rpm/copy/ 
source_repo=${SRC_REPO_HREF} dest_repo=${DEST_REPO_HREF} types="errata"||

|
|
On the other hand, we're planning to have an actual CLI, then this 
probably isn't really an issue.  The way we're doing things now isn't 
wrong, it's just frustrating to do from a shell.  But I don't know 
exactly what our CLI plans are.


What are your thoughts?

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Content Copy (between repos)

2019-04-06 Thread Justin Sherrill


On 4/4/19 2:35 PM, Daniel Alley wrote:
Content copy between repositories is critically important to Katello 
integration and is something that we have not really addressed yet.  
It also needs to be completed before the RPM plugin can begin work on 
depsolving.  The story that results from this discussion should 
probably be put on one of the next 1-3 sprints and not wait any longer 
than that.


Repositories are generic to all types of content, but copy operations 
between repositories will need type-specific options defined by the 
plugin for e.g. advanced copy w/ depsolving.  Therefore we need to 
find a design for this functionality where it will make sense that 
makes sense from a usage perspective and an implementation perspective.


To get this discussion started, here's some suggestions about how this 
could work from the user perspective (presented without much thought 
put into how it would be implemented).


Create a "Copier" object, like a Remote, that stores it's own settings 
and has one or many endpoints (like "/sync/") that can be POSTed to 
and return a task and create a new repository version.


POST /pulp/v3/api/copiers/rpm/$endpoint_name/ content_units=[...]  
[more params...]


The copier would store settings such as "recursive" and the "source" 
and "destination" repositories.  And let's say they can be overriden.


Or, create new endpoints without any associated DB models, like 
one-shot upload does:


variant 1:  POST /pulp/v3/api/content/rpm/packages/copy/ 
content_units=[...] source="..." destination="..."  [more params...]


variant 2: POST /pulp/v3/api/copy/rpm/package/ content_units=[...] 
source="..." destination="..."  [more params...]


Since basic copy support (just copying the units, no depsolving, etc.) 
could in theory be implemented completely generically, it would be 
nice if we could provide that for free somehow.  But I don't 
immediately see a way of doing so.


I welcome any suggestions or input.


A 'copier' object that has to be CRUD'd seems like unnecessary 
complication from a user perspective.  I would go with the 'one shot 
approach' at first, i think most users are going to desire that over an 
actual object.


Justin





___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Unified interface for plugin actions

2019-03-01 Thread Justin Sherrill


On 3/1/19 2:45 PM, Robin Chan wrote:

Justin,
Would such a change make a significant difference in the effort, 
complexity, or time to migrate existing (or support new) plugins in 
Katello?


It would be a very small and simple change.




Robin

On Fri, Mar 1, 2019 at 2:00 PM Justin Sherrill <mailto:jsher...@redhat.com>> wrote:


To me this makes a lot of sense, allows for plugin flexibility,
and is
more consistent across plugins.

I feel like this will make differences between plugins more
understandable by reading the api docs, rather than scanning the
README's of the respective plugin and trying to work out what is
different.

Justin

On 2/28/19 1:42 PM, Austin Macdonald wrote:
> Now that we have a handful of plugins that have somewhat different
> workflows, surprising user-facing differences in the interface for
> plugin-related actions are becoming apparent.
>
> Example: Publish
>     File:
>     Create a publisher
>     v3/publishers/file/1/publish/ repository=$REPO
>     Ansible:
>     (no publisher)
>     v3/publications/ repository=$REPO
>
> The difference is not huge, having a different endpoint does defy
> expectations of a user who is familiar with one plugin, who then
moves
> to another plugin.
>
> Plugins can also implement other endpoints, like RPM's one-shot
> upload. The problem is that we have mixed idioms. Plugins are
> encouraged to create task endpoints for objects (remote's sync,
> publisher's publish), but they are also encouraged to create
arbitrary
> endpoints for any other actions. Users are not able to form
reasonable
> expectations for this part of the interface from plugin to plugin.
>
> Proposal:
> We could move all "actions" into a single area, namespaced by
plugin
> (by convention). This would allow the plugins the freedom to do
> whatever they need to do while keeping the interface consistent and
> predictable for users of multiple plugins. These "actions" could be
> synchronous or asynchronous. Importantly, this would also create a
> logical "group" of endpoints a user could look for in the REST
API docs.
>
> Examples:
> v3/actions/file/publish/ publisher=$PUB repository=$REPO
> v3/actions/ansible/publish/ repository=$REPO
> v3/actions/rpm/upload/ file@./foo-4.1-1.noarch.rpm repository=$REPO
>
> Will this push back the RC?
> No. The changes to the plugin API will be small, and the changes to
> each plugin would be moving sync and publish endpoints, leaving the
> logic almost identical. I anticipate the most time consuming
aspect of
> this will be adjusting the documentation of each plugin-- but since
> they will follow similar patterns, this shouldn't be too much
work either.
>
> To sum up:
> We should move sync and publish endpoints to
> /actions/// to be consistent with other
> plugin-defined actions like one-shot upload. This will look very
nice
> in swagger docs, and should provide more consistent workflows for
> users of multiple plugins.

___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Unified interface for plugin actions

2019-03-01 Thread Justin Sherrill
To me this makes a lot of sense, allows for plugin flexibility, and is 
more consistent across plugins.


I feel like this will make differences between plugins more 
understandable by reading the api docs, rather than scanning the 
README's of the respective plugin and trying to work out what is different.


Justin

On 2/28/19 1:42 PM, Austin Macdonald wrote:
Now that we have a handful of plugins that have somewhat different 
workflows, surprising user-facing differences in the interface for 
plugin-related actions are becoming apparent.


Example: Publish
    File:
    Create a publisher
    v3/publishers/file/1/publish/ repository=$REPO
    Ansible:
    (no publisher)
    v3/publications/ repository=$REPO

The difference is not huge, having a different endpoint does defy 
expectations of a user who is familiar with one plugin, who then moves 
to another plugin.


Plugins can also implement other endpoints, like RPM's one-shot 
upload. The problem is that we have mixed idioms. Plugins are 
encouraged to create task endpoints for objects (remote's sync, 
publisher's publish), but they are also encouraged to create arbitrary 
endpoints for any other actions. Users are not able to form reasonable 
expectations for this part of the interface from plugin to plugin.


Proposal:
We could move all "actions" into a single area, namespaced by plugin 
(by convention). This would allow the plugins the freedom to do 
whatever they need to do while keeping the interface consistent and 
predictable for users of multiple plugins. These "actions" could be 
synchronous or asynchronous. Importantly, this would also create a 
logical "group" of endpoints a user could look for in the REST API docs.


Examples:
v3/actions/file/publish/ publisher=$PUB repository=$REPO
v3/actions/ansible/publish/ repository=$REPO
v3/actions/rpm/upload/ file@./foo-4.1-1.noarch.rpm repository=$REPO

Will this push back the RC?
No. The changes to the plugin API will be small, and the changes to 
each plugin would be moving sync and publish endpoints, leaving the 
logic almost identical. I anticipate the most time consuming aspect of 
this will be adjusting the documentation of each plugin-- but since 
they will follow similar patterns, this shouldn't be too much work either.


To sum up:
We should move sync and publish endpoints to 
/actions/// to be consistent with other 
plugin-defined actions like one-shot upload. This will look very nice 
in swagger docs, and should provide more consistent workflows for 
users of multiple plugins.


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Changes in the Pulp 3 Upload story

2019-02-22 Thread Justin Sherrill


On 2/22/19 12:07 PM, Brian Bouterse wrote:



On Fri, Feb 22, 2019 at 9:36 AM Justin Sherrill <mailto:jsher...@redhat.com>> wrote:



On 2/18/19 2:41 PM, Austin Macdonald wrote:

Originally, our upload story was as follows:
The user will upload a new file to Pulp via POST to /artifacts/
(provided by core)
The user will create a new plugin specific Content via POST to
/path/to/plugin/content/, referencing whatever artifacts that are
contained, and whatever fields are expected for the new content.
The user will add the new content to a repository via POST to
/repositories/1/versions/

However, this is somewhat cumbersome to the user with 3 API calls
to accomplish something that only took one call in Pulp 2.


How would you do this with one call in pulp2?

https://docs.pulpproject.org/dev-guide/integration/rest-api/content/upload.html
seems to suggest 3-4 calls.

Some plugins implemented the pulp2 equivalent of a one-shot uploader. 
Those docs are for pulp2's core which don't include the plugin's docs.




There are a couple of different paths plugins have taken to
improve the user experience:
The Python plugin follows the above workflow, but reads the
Artifact file to determine the values for the fields. The RPM
plugin has gone even farther and created a new endpoint for "one
shot" upload that perform all of this in a single call. I think
it is likely that the Python plugin will move more in the "one
shot" direction, and other plugins will probably follow.


How does the RPM one shot api work?  Will it be compatible with
whatever solution https://pulp.plan.io/issues/4196 arrives at?

You would upload the Artifact as binary data along with what content 
type it is and what relative path it uses and Pulp creates the 
Artifact, Content unit, ContentArtifact. It should be compatible with 
issue 4196 because django's binary form data should allow for parallel 
uploading before calling the view handler. It may take 2 calls though. 
The issue to me isn't about the number of calls as it is the client 
data payload complexity.
If i'm having to chunk up data, i already have quite a bit of client 
data payload complexity.  In pulp 2 this was most of the complexity!


I would hate for all our plugins to move to One shot methods which
users can't even rely on.

I don't think we're taking the "generic" uploading away. You can 
always rely on that. The issue w/ one-shot is that it's not possible 
(literally) for many content types, e.g. Artifact-less content. It's 
also hard for multi-artifact Content so that would probably still be 
something plugin writers would provide as a custom thing for their 
content type. Regardless it's just not possible to have consistency in 
this area.


Why is it not possible to create a one-shot upload for artifact-less 
content?  (maybe we're defining what a one-shot upload actually is 
differently, i'm reading it as something that combines multiple steps 
into one)


Why is consistency not possible? I guess i don't see a huge variation of 
upload scenarios beyond:


1.  upload Zero to many files as artifacts

2.  Provide some metadata about the zero or more artifacts or let the 
plugin parse it out themselves (or maybe even a combination of the two)


3.  Import that unit into a repository.

I can see it being difficult as a user to go through all of those steps 
(even if 2 & 3 were combined into one), and the desire is to simplify 
the process, but uploading arbitrary files is not simple.   Why do i 
need to give up the plugin's ability to parse the unit's details because 
i'm using the consistent api?


Keep in mind all my questions are coming from a very ignorant 
perspective with respect of pulp3 internals, and more from a user 
perspective.



My problem with single api calls to upload files is that we cannot
reliably use them due to limitation in request sizes.  We have to
be prepared to use multiple calls to upload files regardless. 
Maybe if a user is using some plugin that never has super large
files (ansible?) you could be confident you would never hit a
request size limitation.   But file, docker, and yum all would
require multiple calls to get the physical data to the server.

I believe arbitrarily large files can be uploaded either through 
multi-part form data or through the django-chunked interface. We'll 
see what happens with 4196, but I expect arbitrary payload size to be 
a requirement for Pulp users.


I care more about having a consistent method for uploading files
than having fewer api calls.   If we need a some content specific
api, that's fine, but please make it a consistent part of the
process.

It sounds like the 4-call interface is the only choice then if 
consistency is 

Re: [Pulp-dev] Changes in the Pulp 3 Upload story

2019-02-22 Thread Justin Sherrill


On 2/19/19 12:30 PM, Brian Bouterse wrote:
Overall +1, we should make a 1-shot uploader. It will need a bit of 
code from plugin writers to make it work though.


I believe the "manual and generic" feature offerings we have now are 
pretty good. I don't think we should take them away.


To describe yet another related use case; consider the uploading of a 
tarball with a lot of units in it (maybe a whole repo). For instance 
bunch of Ansible roles in a tarball is what a user would want to 
upload to Pulp, so you would want to upload the whole tarball at once. 
I see three cases:


a) The generic method (what we already have). The user supplies all 
metadata because we can't know the right attributes without them 
telling it to us.
b) The one-shot uploader to create a single content unit from a single 
Artifact. I think this is what is being proposed here. +1 to this.
c) Uploading a bunch of content units in one binary, e.g. a tarball. 
We should leave this to plugin writers to implement for now.


Could b & c be the same api?  The api could (depending on what the 
plugin wants to implement) accept either one unit file or a tar file 
with a bunch of different units.




In terms of implementing (b) I believe it would be useful to have a 
Content unit provide an optional class method that will instantiate a 
content unit (but not save it) from an Artifact. Then core can offer 
an API that accepts a single Artifact binary and what content unit 
type it is and it should create+save the Artifact and then create+save 
the Content unit from that Artifact.


This doesn't address adding it to/from repositories. That would be ok 
too as an additional option, but you probably want to do it to a whole 
bunch at once somehow. Also it will need to go through the tasking 
system for correctness.



On Tue, Feb 19, 2019 at 11:52 AM David Davis > wrote:


What about the case of multi-artifact content? Don’t they require
separate artifact creation and content creation routes?

David


On Tue, Feb 19, 2019 at 11:40 AM Austin Macdonald
mailto:aus...@redhat.com>> wrote:

I think the key question to ask is:
What circumstances would require the creation of Content that
would not be met by a one-shot upload?

On Tue, Feb 19, 2019 at 11:34 AM Daniel Alley
mailto:dal...@redhat.com>> wrote:

@Matthias why would you prefer to keep the normal create? 
As you mention, the "orphan cleanup" issue applies to
both, so there's no advantage given to the former.

The normal "create" ( POST .../content/plugin/type/ ...)
is unidiomatic and inconsistent, because the fields needed
to upload a content are different from the fields on the
actual serializer.  Most content types keep metadata
inside the packages themselves, so you can't let the user
specify the content field values, you have to contort
everything so that instead of hydrating the serializer
from user input, it does so by parsing the content.

There's also the issue that the libraries we're using to
parse the (Python and RPM) packages do some validation on
the filename to see that it has the right extension and so
forth, and since artifacts are content-addressed and don't
store that information, with normal create you have to
pass the filename of the original artifact *back in* at
the time you create it, and then copy the file from Pulp
content storage into a temp directory under a filename
which will validate properly, which is highly unfortunate.

With one-shot upload, you avoid both of those problems,
because there's no broken expectations as to what fields
should be accepted, and because it should be possible
(though I haven't tried it) to parse the original file
*before* saving it as an artifact, thus avoiding a lot of
mess. And you also get the option to add it straight into
a repository. In my opinion, it's a straight upgrade.

On Tue, Feb 19, 2019 at 10:57 AM Matthias Dellweg
mailto:dell...@atix.de>> wrote:

I have no objections to having the "one shot upload"
or even "one shot
upload into repositoryversion". I think, i would like
to keep the
'traditional' create anyway.
The problem i see with create and one shot upload is,
that another
thing could have triggered 'delete orphans' at the
wrong time, and you
shiny new content unit disappears, before you can add
it to a
repository version.

On Mon, 18 Feb 2019 14:41:54 -0500
Aust

Re: [Pulp-dev] Changes in the Pulp 3 Upload story

2019-02-22 Thread Justin Sherrill


On 2/18/19 2:41 PM, Austin Macdonald wrote:

Originally, our upload story was as follows:
The user will upload a new file to Pulp via POST to /artifacts/ 
(provided by core)
The user will create a new plugin specific Content via POST to 
/path/to/plugin/content/, referencing whatever artifacts that are 
contained, and whatever fields are expected for the new content.
The user will add the new content to a repository via POST to 
/repositories/1/versions/


However, this is somewhat cumbersome to the user with 3 API calls to 
accomplish something that only took one call in Pulp 2.


How would you do this with one call in pulp2? 
https://docs.pulpproject.org/dev-guide/integration/rest-api/content/upload.html 
seems to suggest 3-4 calls.




There are a couple of different paths plugins have taken to improve 
the user experience:
The Python plugin follows the above workflow, but reads the Artifact 
file to determine the values for the fields. The RPM plugin has gone 
even farther and created a new endpoint for "one shot" upload that 
perform all of this in a single call. I think it is likely that the 
Python plugin will move more in the "one shot" direction, and other 
plugins will probably follow.


How does the RPM one shot api work?  Will it be compatible with whatever 
solution https://pulp.plan.io/issues/4196 arrives at?


I would hate for all our plugins to move to One shot methods which users 
can't even rely on.


My problem with single api calls to upload files is that we cannot 
reliably use them due to limitation in request sizes.  We have to be 
prepared to use multiple calls to upload files regardless.  Maybe if a 
user is using some plugin that never has super large files (ansible?) 
you could be confident you would never hit a request size limitation.   
But file, docker, and yum all would require multiple calls to get the 
physical data to the server.


I care more about having a consistent method for uploading files than 
having fewer api calls.   If we need a some content specific api, that's 
fine, but please make it a consistent part of the process.


I feel like we may be chasing the wrong goal here (fewer calls vs a more 
consistent experience).




That said, I think we should discuss this as a community to encourage 
plugins to behave similarly, and because there may also be a 
possibility for sharing some of code. It is my hope that a "one shot 
upload" could do 2 things: 1) Upload and create Content. 2) Optionally 
add that content to repositories.


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Changes in the Pulp 3 Upload story

2019-02-22 Thread Justin Sherrill


On 2/19/19 6:38 AM, Ina Panova wrote:

+1 to facilitate the upload process.
At the conferences, there have been many users pointing out how 
inconvenient current upload process is .


Is it so much more inconvenient because pulp3 doesn't have a cli?

Justin





Regards,

Ina Panova
Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Mon, Feb 18, 2019 at 8:42 PM Austin Macdonald > wrote:


Originally, our upload story was as follows:
The user will upload a new file to Pulp via POST to /artifacts/
(provided by core)
The user will create a new plugin specific Content via POST to
/path/to/plugin/content/, referencing whatever artifacts that are
contained, and whatever fields are expected for the new content.
The user will add the new content to a repository via POST to
/repositories/1/versions/

However, this is somewhat cumbersome to the user with 3 API calls
to accomplish something that only took one call in Pulp 2.

There are a couple of different paths plugins have taken to
improve the user experience:
The Python plugin follows the above workflow, but reads the
Artifact file to determine the values for the fields. The RPM
plugin has gone even farther and created a new endpoint for "one
shot" upload that perform all of this in a single call. I think it
is likely that the Python plugin will move more in the "one shot"
direction, and other plugins will probably follow.

That said, I think we should discuss this as a community to
encourage plugins to behave similarly, and because there may also
be a possibility for sharing some of code. It is my hope that a
"one shot upload" could do 2 things: 1) Upload and create Content.
2) Optionally add that content to repositories.
___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Replicate a directory structure of a remote repo or not?

2019-02-22 Thread Justin Sherrill
Speaking for our Katello users, I don't know that its required for us to 
do, but might be nice to have.  I think we might want to have the option 
to preserve or not, as alphabetically organized repositories are nice to 
have even when the upstream repository isn't laid out in such a way.


I do seem to remember in the past some desire around kickstarts to 
preserve the structure, but i do not remember the details around it.


On 2/18/19 12:34 PM, Tatiana Tereshchenko wrote:

+pulp-list

On Mon, Feb 18, 2019 at 6:14 PM Tatiana Tereshchenko 
mailto:ttere...@redhat.com>> wrote:


RPM plugin team discussed this question recently and we are
leaning towards a conclusion that by default Pulp is expected to
publish a repo with a directory structure of a remote repository.

E.g. At the moment if no base_path is configured for a
distribution, those two repositories [0][1] (same content,
different layout) result in a repo with the same flat structure,
all packages go into the root directory. Is there an expectation
that Pulp would generate two repositories with the directory
structure as in the original remote repo?

1. RPM plugin users, please, speak out, do you need/expect/want a
directory structure to be the same as in a remote repo you sync from?

2. It would be good to know if there is such a need for any other
plugin than RPM. It will help to answer the questions: Should we
handle this in pulpcore? or in every plugin since plugins might
have different needs for a default layout?

Thank you,
Tanya

[0] https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-unsigned/
[1] https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-alt-layout/


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] [Pulp-list] Pulp 3 RC info

2019-02-08 Thread Justin Sherrill
To expand on this, we're starting integration now with pulp3 and file 
repositories.  I don't have a good estimation for how long it will take, 
but my goal is ~2 months (so more likely 3 or 4 months).   This should 
give us a good estimation for how difficult it will be add future 
content types as well.    Once we have 2 types integrated I think we 
will have a pretty good confidence in the core apis.


Justin

On 2/4/19 12:52 PM, David Davis wrote:
To give kind of an update on integrating Katello and Pulp 3, I don’t 
think there has been much to report since the fall. Since then, Pulp 
has been focusing on finishing up the Pulp 3 RC features (which 
included some things Katello needed like lazy sync) while Katello has 
been refactoring parts of its code to make integration easier.


Now that both those pieces are close to being done, we’re meeting 
every week with Katello starting Wednesday to work on the integration 
between Katello and Pulp 3.


David


On Mon, Feb 4, 2019 at 12:04 PM Brian Bouterse > wrote:


The long RC is, in-part, to ensure that Pulp 3.0.0 is fully usable
for Katello. Related to that, one question I keep wondering is:
when will Pulp3 land in an upstream Katello release and for which
plugins? I've lost touch with that conversation so I have no idea
what to expect. I believe Pulp3 being offered in Katello will
greatly inform confidence in the 3.0.0 release candidate going GA.

On Thu, Jan 31, 2019 at 8:45 AM Bryan Kearney mailto:bkear...@redhat.com>> wrote:

-- pulp-list

RC for a year? That seems way long. I get the idea of semanti=c
versioning, but is there any other path for that?

--bk

On 1/30/19 5:28 PM, David Davis wrote:
> We’re approaching an RC release of Pulp 3.0 and we’ve put
together a
> blog post with some information about this release:
>
> https://pulpproject.org/2019/01/30/pulp-3-rc-information/
>
> Please feel free to respond to this email with any
questions, comments,
> or concerns you have about the RC.
>
> David
>
> ___
> Pulp-list mailing list
> pulp-l...@redhat.com 
> https://www.redhat.com/mailman/listinfo/pulp-list
>


___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] SSL/OID content-guard

2018-11-26 Thread Justin Sherrill
Today we can use pulp 2 to fetch both yum and ostree protected content.  
Other content is protected, but we dont' really have a way to auto 
generate or configure client certificates to fetch the content behind 
the protection.    Its possible that some users are using certs to do 
more with ssl protected content, but those are the only two types where 
we provide workflows to the user (via subscription-manager).


My vote would be to make it generic, unless it is much more difficult to 
do so.


Justin Sherrill

On 11/26/18 11:56 AM, Jeff Ortel wrote:
To support content protection using an X.509 certificate containing 
OID extensions a concrete ContentGuard needs to be developed.  The 
question is, in which plugin does this belong?  Issue #4009 suggests 
the RPM plugin.  I'm not convinced that this is specific only to 
protecting RPMs.  Is it?



[1] https://pulp.plan.io/issues/4009

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp RPM dependency solver refactoring dilemma

2018-06-26 Thread Justin Sherrill



On 06/26/2018 11:30 AM, Milan Kovacik wrote:

Folks,

TL;DR should we support alternative solvers (configuration) during
recursive unit association?

I've been refactoring the current approach to RPM dependency solving
for e.g the recursive copy to be able to handle rich dependencies[1].

While testing, I ran into an dependency issue that is caused by me not
processing file-provides records correctly[2].

No matter the current insufficiency in my coding, a user trying to
copy stuff from a repo with libsolv-unresolvable dependencies might
hit similar issues and consider them regressions from previous
behavior, hence the question:

Should the user be able to select a solver (configuration) for
particular associate call thru the REST API?
I commented on the PR, but i think the behavior we're seeing is okay and 
can be ignored (assuming we can still pull in the deps that are 
available).  Assuming we can, do we still need it to be configurable?


I would also like to point out this issue to keep in mind: 
https://pulp.plan.io/issues/2478


Justin




Cheers,
milan


[1] https://github.com/pulp/pulp_rpm/pull/1122
[2] https://github.com/pulp/pulp_rpm/pull/1122#issuecomment-400061802

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Lazy for Pulp3

2018-05-16 Thread Justin Sherrill



On 05/16/2018 01:02 PM, Brian Bouterse wrote:
A mini-team of @jortel, @ttereshc, @ipanova, @dkliban, and @bmbouter 
met today to discuss Lazy use-cases for Pulp3. The "initial" use cases 
would all be delivered together as the first implementation to be used 
with Pulp 3.0. The ones for "later" are blocked on other gaps in 
core's functionality (specifically content protection) which should 
come with 3.1+.


Feedback on these use cases is welcome. We are meeting again on this 
upcoming Monday, after which, we will writeup the work into Redmine. 
We'll email this thread with links to the Redmine plan when it's 
available for comment.


Initial use cases are:

  * pull-through caching of packages (squid)

  * parallel streaming of bits to multiple clients (squid)

  * Pulp redirects to squid when content is not already downloaded (pulp)

  * streaming data and headers (streamer)

  * After streamer downloads the content, the new Artifact is created
and associated with the correct ContentArtifact (downloader)

  * to use a configured downloader, configured by the correct remote.
This would correctly configure authentication, proxy, mirrorlists,
etc. when fetching content (streamer)


Use cases to be implemented later. Currently blocked because Pulp 
itself doesn't verify client entitlement for content currently.


  * authentication of the client to verify they are entitled to the
content




Could I suggest to consider:

* The ability to delete all downloaded content in a repository 
(basically null out the content).  I've been using this rhel7 repo for 
years, and likely all the old content is not needed anymore.


I've seen this requested from time to time over the past couple years.






___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Replicate metadata

2018-05-14 Thread Justin Sherrill
I think it kind depends in how pulp would do it.  Thinking of the yum 
example, all the information is there in an upstream yum repo for pulp 
to import the 'publication' as is.  If it can be done 'naturally' with a 
yum repo, there wouldn't be anything special around pulp -> pulp, it'd 
just be  yum_repo -> pulp. However we'd need a pulp dev to chime in here :)


Justin


On 05/14/2018 12:08 PM, Bryan Kearney wrote:

@Justin, I think that makes sense for Pulp->Pulp. For Matthias, I think
he needs Native->Pulp which would not have a publication.

-- bk

On 05/14/2018 11:42 AM, Matthias Dellweg wrote:

Mirroring the metedata exactly is also very important for Debian
Repositories, because of the way the metadata is signed in lieu of the
whole content. So it would be very beneficial, if this could be
provided as a 'service' of the pulp core platform somehow.

Matthias

On Mon, 14 May 2018 11:31:52 -0400
Justin Sherrill  wrote:


  From my understanding of pulp 3, this would maybe involve the
ability to 'import' a publication.  Would that make sense?

Justin


On 05/09/2018 08:22 AM, Bryan Kearney wrote:

One of the themes I heard yesterday at the Red Hat Summit was around
having a pulp server mirror the upstream RPM repo metadata exactly.
The use case is that two pulp servers are behind a load balancer
mirroring the same repo. The users would like to be able to flip a
yum client acrross the two servers. Running createrepo to make
unique repos causes issues for the clients that appear to be
errors. I assume this pattern would not be unique for other package
clients that cache metadata.

So, when looking ahead to pulp 3 I would ask that this be taken into
consideration. I can provide more info / use cases if necessary.

-- bk



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Replicate metadata

2018-05-14 Thread Justin Sherrill
From my understanding of pulp 3, this would maybe involve the ability 
to 'import' a publication.  Would that make sense?


Justin


On 05/09/2018 08:22 AM, Bryan Kearney wrote:

One of the themes I heard yesterday at the Red Hat Summit was around
having a pulp server mirror the upstream RPM repo metadata exactly. The
use case is that two pulp servers are behind a load balancer mirroring
the same repo. The users would like to be able to flip a yum client
acrross the two servers. Running createrepo to make unique repos causes
issues for the clients that appear to be errors. I assume this pattern
would not be unique for other package clients that cache metadata.

So, when looking ahead to pulp 3 I would ask that this be taken into
consideration. I can provide more info / use cases if necessary.

-- bk



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] permission on downloaded artifacts

2018-05-02 Thread Justin Sherrill

HI All!

I noticed while testing out pulp 3, that artifacts are downloaded as:


$ ls -l 
/var/lib/pulp/artifact/04/2c259d546331588e1dff83a46f62a27fb7cf3de4050924470d99fd8d2a046f 

-rw---. 1 root root 4358144 May  2 15:42 
/var/lib/pulp/artifact/04/2c259d546331588e1dff83a46f62a27fb7cf3de4050924470d99fd8d2a046f


while the directories are 755.

In my case my workers were running as root, but my web server was 
running as another user.   I know production deployment is a long way 
away, but it would make sense to to allow for at least group read (740) 
so that i could run my web server as one user and my workers as another 
user for better isolation?


Justin

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp api seemingly incompatible with generated bindings

2018-04-30 Thread Justin Sherrill



On 04/30/2018 10:46 AM, Jeremy Audet wrote:
> +1. Exposing UUIDs is definitely preferable to using hrefs as ids. 
 "The app just looks at the relative path"  -> what if pulp wants the 
flexibility to change repositories end point (highly improbable but 
you never know).


Is it better, though? URIs were chosen specifically with immutability 
in mind. "Cool URIs don't change." This is reflected in the 
application's behaviour. If one changes an object's attributes (e.g. 
UUID), its href doesn't change.


And in what case are hostname and port changing? If that's a common 
deployment issue, I would contend that the deployment at hand is 
screwed up.


I somewhat agree with you, however enough users have requested (or 
should i say demanded) this feature that katello now ships support for 
it natively.  Satellite does as well: 
https://access.redhat.com/solutions/1232133


I wish it were not a common issue, but it is, enough so that this is a 
requirement of any integration that katello does with pulp 3.


Justin




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp api seemingly incompatible with generated bindings

2018-04-30 Thread Justin Sherrill



On 04/30/2018 10:05 AM, David Davis wrote:
So what I’d probably propose is exposing the UUIDs in the response and 
then extending HyperlinkedRelatedFields to accept UUID or href. Then 
third parties like Katello could store and just use UUIDs (and not 
worry about hrefs).


Regarding hrefs though, hostname and port don’t matter. The app just 
looks at the relative path. It looks like changing the deployment path 
causes problems though.


It matters if you are a client and are fetching stored hrefs.

Justin




David

On Mon, Apr 30, 2018 at 9:58 AM, Justin Sherrill <mailto:jsher...@redhat.com>> wrote:




On 04/27/2018 07:18 PM, David Davis wrote:

I’m not sure how returning UUIDs in our responses helps Katello.
In our previous conversation, it was concluded that Katello
should use the hrefs[0]. Why expose UUIDs if Katello is not going
to store them?


And thats fine, but bindings are pointless at that point, so pulp
shouldn't really advertise them as a feature.   This seemed to
have been 'talked up' quite a bit as a feature, but is completely
unusable.



Katello could store/use UUIDs but then it's going to run into
problems when dealing with parameters that are hrefs (such as
repository_version for publishing[1]).

[0]
https://www.redhat.com/archives/pulp-dev/2018-January/msg4.html
<https://www.redhat.com/archives/pulp-dev/2018-January/msg4.html>
[1]

https://github.com/pulp/pulp_file/blob/5ffb33d8c70ffbb247aba8bf5b45633eba414b79/pulp_file/app/viewsets.py#L54

<https://github.com/pulp/pulp_file/blob/5ffb33d8c70ffbb247aba8bf5b45633eba414b79/pulp_file/app/viewsets.py#L54>


Could you explain a bit about this?

In order to use pulp 3 then, i'd guess we would either need to:

1) store ALL hrefs about all objects
2) fetch an object before we can do anything with it

Or am i missing an option 3?

On a side note, the href's seem to include
hostname/port/deployment path.  This seems incompatible with
things like hostname changes.  We can fairly easily just chomp off
only the path, but if i were a user and had stored all these
hrefs, i would be very unhappy if i had all the full href's stored.

Justin





David

On Fri, Apr 27, 2018 at 4:29 PM, Dennis Kliban
mailto:dkli...@redhat.com>> wrote:

I can't remember why we decided to remove UUID from the
responses. It sounds like we should add them back.

On Fri, Apr 27, 2018 at 12:26 PM, Justin Sherrill
mailto:jsher...@redhat.com>> wrote:

Hi All!

I started playing around with pulp 3 and generated
bindings via https://pulp.plan.io/issues/3580
<https://pulp.plan.io/issues/3580> and it results
somewhat in what you would expect.  Here's an example:

    # @param id A UUID string identifying this repository.
    # @param [Hash] opts the optional parameters
    # @return [Repository]
    def repositories_read(id, opts = {})
  data, _status_code, _headers =
repositories_read_with_http_info(id, opts)
  return data
    end


Notice that the UUID is to be passed in.  When creating a
repository, i only get the _href:

{
    "_href":

"http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/

<http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/>",
    "_latest_version_href": null,
    "_versions_href":

"http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/versions/

<http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/versions/>",
    "created": "2018-04-27T15:26:03.546956Z",
    "description": "",
    "name": "test",
    "notes": {}
}

Meaning, there's really no way to use this specific
binding with the return format for pulp.   I imagine most
binding generation would be expecting the user to know
the ID of the objects and not work off of _hrefs.  Any
reason to not include the IDs in the response?

Justin

___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




Re: [Pulp-dev] Pulp api seemingly incompatible with generated bindings

2018-04-30 Thread Justin Sherrill



On 04/27/2018 07:18 PM, David Davis wrote:
I’m not sure how returning UUIDs in our responses helps Katello. In 
our previous conversation, it was concluded that Katello should use 
the hrefs[0]. Why expose UUIDs if Katello is not going to store them?


And thats fine, but bindings are pointless at that point, so pulp 
shouldn't really advertise them as a feature.   This seemed to have been 
'talked up' quite a bit as a feature, but is completely unusable.




Katello could store/use UUIDs but then it's going to run into problems 
when dealing with parameters that are hrefs (such as 
repository_version for publishing[1]).


[0] 
https://www.redhat.com/archives/pulp-dev/2018-January/msg4.html 
<https://www.redhat.com/archives/pulp-dev/2018-January/msg4.html>
[1] 
https://github.com/pulp/pulp_file/blob/5ffb33d8c70ffbb247aba8bf5b45633eba414b79/pulp_file/app/viewsets.py#L54 
<https://github.com/pulp/pulp_file/blob/5ffb33d8c70ffbb247aba8bf5b45633eba414b79/pulp_file/app/viewsets.py#L54>


Could you explain a bit about this?

In order to use pulp 3 then, i'd guess we would either need to:

1) store ALL hrefs about all objects
2) fetch an object before we can do anything with it

Or am i missing an option 3?

On a side note, the href's seem to include hostname/port/deployment 
path.  This seems incompatible with things like hostname changes. We can 
fairly easily just chomp off only the path, but if i were a user and had 
stored all these hrefs, i would be very unhappy if i had all the full 
href's stored.


Justin




David

On Fri, Apr 27, 2018 at 4:29 PM, Dennis Kliban <mailto:dkli...@redhat.com>> wrote:


I can't remember why we decided to remove UUID from the responses.
It sounds like we should add them back.

On Fri, Apr 27, 2018 at 12:26 PM, Justin Sherrill
mailto:jsher...@redhat.com>> wrote:

Hi All!

I started playing around with pulp 3 and generated bindings
via https://pulp.plan.io/issues/3580
<https://pulp.plan.io/issues/3580> and it results somewhat in
what you would expect.  Here's an example:

    # @param id A UUID string identifying this repository.
    # @param [Hash] opts the optional parameters
    # @return [Repository]
    def repositories_read(id, opts = {})
  data, _status_code, _headers =
repositories_read_with_http_info(id, opts)
  return data
    end


Notice that the UUID is to be passed in.  When creating a
repository, i only get the _href:

{
    "_href":

"http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/

<http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/>",
    "_latest_version_href": null,
    "_versions_href":

"http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/versions/

<http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/versions/>",
    "created": "2018-04-27T15:26:03.546956Z",
    "description": "",
    "name": "test",
    "notes": {}
}

Meaning, there's really no way to use this specific binding
with the return format for pulp.   I imagine most binding
generation would be expecting the user to know the ID of the
objects and not work off of _hrefs.  Any reason to not include
the IDs in the response?

Justin

___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>



___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Pulp api seemingly incompatible with generated bindings

2018-04-27 Thread Justin Sherrill

Hi All!

I started playing around with pulp 3 and generated bindings via 
https://pulp.plan.io/issues/3580 and it results somewhat in what you 
would expect.  Here's an example:


    # @param id A UUID string identifying this repository.
    # @param [Hash] opts the optional parameters
    # @return [Repository]
    def repositories_read(id, opts = {})
  data, _status_code, _headers = 
repositories_read_with_http_info(id, opts)

  return data
    end


Notice that the UUID is to be passed in.  When creating a repository, i 
only get the _href:


{
    "_href": 
"http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/";,

    "_latest_version_href": null,
    "_versions_href": 
"http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/versions/";,

    "created": "2018-04-27T15:26:03.546956Z",
    "description": "",
    "name": "test",
    "notes": {}
}

Meaning, there's really no way to use this specific binding with the 
return format for pulp.   I imagine most binding generation would be 
expecting the user to know the ID of the objects and not work off of 
_hrefs.  Any reason to not include the IDs in the response?


Justin

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Importer Name

2018-03-12 Thread Justin Sherrill



On 03/08/2018 11:13 AM, Austin Macdonald wrote:

Motivation:
The name "importer" carries some inaccurate implications.
1) Importers should "import". Tasks like "sync" will do the actual 
importing. The object only holds the configuration that happens to be 
used by sync tasks.
2) Sync tasks on mirror mode remove content as well as add it, so 
"import" isn't quite right.


Proposed name: Remote

The inspiration for remote is "git remote". In git, remotes represent 
external repositories, which is almost exactly what our importers do.
I'm fairly apathetic to a name change.  It would be annoying to us in 
katello land, but not really a huge deal either way.  I don't think 
importer a bad name, as it does hold configuration around 'importing'.  
the fact that it itself doesn't actually do the importing is a technical 
detail and really isn't a big deal IMO. User's likely wouldn't care, but 
for developers i guess its just weighing a more 'perfect' name to the 
work of changing everything (including documentation) at this stage.




---
Part 2: Trim the fields

Currently, Importers have settings that can be categorized in 2 ways. 
I am proposing removing the "sync settings" from the Remote model:


External Source information
    name
    feed_url
    validate
    ssl_ca_certificate
    ssl_client_certificate
    ssl_client_key
    ssl_validation
    proxy_url
    username
    password

Sync settings
    download_policy
    sync_mode

This had some advantages when Importers were related to Repositories. 
For example, having a repository.importer that always used the same 
sync mode made sense. However, the "how" to sync settings don't make 
much sense when importers and repositories are not linked. It seems 
very reasonable that a user might have 2 repositories that sync from 
the same source (ex EPEL). It does not make sense for them to have 
create an Importer for the EPEL repository twice or more just to 
change sync_mode or download policy. Instead of modeling these fields, 
I propose that they should POST body parameters.


example
POST v3/remotes/1234/sync/ repositorty=myrepo_href sync_mode=additive, 
dl_policy=immediate
POST v3/remotes/1234/sync/ repositorty=myother_href sync_mode=mirror, 
dl_policy=deferred


So as a user using some future cli, i have to magically 'remember' these 
values?  If so, that seems like a bad user experience and kinda defeats 
the purpose of pulp holding configuration.  Could it be stored on the 
repository itself (or somewhere else) if it doesn't make sense to store 
on the importer/remote?


If you're serious about this, this would be something to ask current 
users of pulp as it seems kind of a big deal.




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev