On Thu, May 17, 2018 at 3:01 PM, Simon Baatz <gmbno...@gmail.com> wrote:
> On Wed, May 16, 2018 at 04:14:33PM -0400, David Davis wrote: > > This is great. I had a chance to look at your plugin and am really > > excited by having chef support in Pulp. > > Some responses below. > > On Tue, May 15, 2018 at 2:31 PM, Simon Baatz <[1]gmbno...@gmail.com> > > wrote: > > > > I created the beginnings of a Pulp 3 plugin to manage Chef > > cookbooks > > [1]. Currently, it supports to create repos, create cookbook > > content > > units, and publish repos. A published & distributed repo will offer > > a "universe" API endpoint for tools like Berkshelf. > > I did not implement sync yet. I am waiting for "PendingVersion" to > > be available > > first. > > I ran into a couple of problems/uncertainties described below (sorry > > for the > > lengthy mail). I am new to Django, DRF, and, obviously, Pulp 3 so > > any remarks or > > suggestions are welcome: > > - Create Content: The plugin reads as much meta-data as possible > > from the actual > > cookbook Artifact when creating a content unit. The motivation for > > this is: > > - One doesn't need a special tool to upload content, which makes > > uploading by e.g. > > a CI job easier. > > - It ensures consistency between metadata stored in Pulp and the > > actual > > metadata in the cookbook. > > However, this requires to extract a metadata file from a gzipped > > tar archive. > > Content unit creation is synchronous and doing this work in a > > synchronous call > > might not be optimal (we had a discussion in this topic on the > > pulp-dev > > mailing list already). > > > > I agree. Can you make this an async call in the plugin or is there a > > need to also make this an async call in core/other plugins? > > Do you mean making the POST to the content/cookbook/ endpoint > asynchronous? I have no idea how easy or hard this is, but wouldn't > that look inconsistent (one plugin returns a direct 201 response, > another a 202)? > It’s quite easy. You’ve got code in your plugin that handles async publishes. I hear you about inconsistency and I don’t think it’s a problem but maybe we should make the other plugins return 202s too? Interested to hear others' thoughts on this. > > > - Publication/Distribution: The metadata file ("universe") for a > > published > > cookbook repository contains absolute URLs for download (i.e. > > these point > > to published artifacts in a distribution). > > The current publication/distribution concept seems to have the > > underlying > > assumption that a Publication is fully relocatable: > > PublishedMetadata > > artifacts are created by the publishing task and creating a > > Distribution is a > > synchronous call that determines the base path of the published > > artifacts. > > This causes a problem with said "universe" file. Ideally, it could > > be > > pre-computed (it lists all artifacts in the repo). However, this > > can't be > > done AFAIK since the base path is unknown at publication time and > > one can't > > generate additional metadata artifacts for a specific distribution > > later. > > The best solution I came up with was to implement a dynamic API. > > To reduce the > > amount of work to be done, the API does a simple string > > replacement: During > > publication, the universe file is pre-computed using placeholders. > > In the > > dynamic API these placeholders are replaced with the actual base > > URL of the > > distribution. > > However, I would prefer not to be forced to implement a dynamic > > API for static > > information. Is there a way to solve this differently? > > > > Are relative paths an option? If not, I don’t think there’s currently > > an alternative to a live api. I probably wouldn’t even use a published > > metadata file TBH. I wonder though if there’s maybe functionality we > > could add to do this. > > Unfortunately, relative paths are not an option. Berkshelf just takes > the URI as is and requests the content. > > Regarding the published metadata file: I am not sure what you are > suggesting. Should the dynamic API be really dynamic? If someone is > going to mirror the entire Chef Supermarket, this will result in a > "universe" that contains around 22,000 cookbooks. Every time > Berkshelf is called with "install" or "update" this resource will be > requested. I don't think that it makes sense to generate this data > dynamically (it basically corresponds to the "primary.xml" file in > the rpm repo case). > > Or are you suggesting to store the pre-computed result differently? > (I don't like the metadata file solution either. OTOH, this object > has exactly the required life cycle and the publication logic > takes care of it. And string.replace() should be pretty fast.) > > For this particular plugin, it would be nice to be able to associate > metadata files with the distribution, not the publication. But that's > probably a little bit too much to hope for... > I was suggesting a completely live api but I see your point about having a lot of cookbooks and calling this endpoint repeatedly. I think the solution you have works. I can’t think of how to better support your use case ATM. > > > - Content-Type Header: The "universe" file is JSON and must have a > > corresponding > > "Content-Type" HTTP header. > > However, content type of the development server seems to be > > "text/html" by > > default for all artifacts. Apparently, I can't set the > > content-type of a > > (meta-data) artifact(?) > > > > I think this goes back to not using a published metadata file to serve > > up your api. However, I could see how it might be useful. > > Sure, in my case it is no problem, since I set the content type > in the dynamic API. The question is more generic, as content types > should be correct for both artifacts and meta data files in general. > > In Pulp 2 it is determined based on the mime type associated with the > file path (in the ContentView). How is this going to work in Pulp 3? > Maybe we can read the mime type of the artifact before we serve it? > > > - Getting the base url of a distribution in the dynamic API is > > surprisingly > > complicated and depends on the inner structure of pulp core (I > > took the > > implementation from 'pulp_ansible'). IMHO, a well defined way to > > obtain it > > should be part of the plugin API. > > > > I agree. Opened: [2]https://pulp.plan.io/issues/3677 > > > > - "Content" class: The way to use only a single artifact in Content > > (like done > > in pulp_file) seems to require in-depth knowledge of the > > Content/ContentSerializer class and its inner workings. > > The downside of this can already be experienced in the "pulp_file" > > plugin: The > > fields "id" and "created" are missing, since the implementation > > there just > > overrides the 'fields' in the serializer). > > I think two Content types should be part of the plugin API: one > > with > > multiple artifacts, and a simpler one with a single artifact > > > > I began working on this: > > [3]https://github.com/pulp/pulp/pull/3476 > > But there was opposition around having the Content model be > responsible > > for generating the relative path on the Content Artifact. I’ve opened > > an issue to see if there’s another way to do this (e.g. using > > serializers): > > [4]https://pulp.plan.io/issues/3678 > > Makes sense. I will follow this. > > > - Uploading an Artifact that already exists returns an error, which > > is > > annoying if you use http/curl to import artifacts. Suppose some > > other user > > uploaded an artifact in the past. You won't get useful > > information from the POST request uploading the same artifact: > > HTTP/1.1 400 Bad Request > > Allow: GET, POST, HEAD, OPTIONS > > Content-Type: application/json > > Date: Sat, 12 May 2018 17:50:54 GMT > > Server: WSGIServer/0.2 CPython/3.6.2 > > Vary: Accept > > { > > "non_field_errors": [ > > "sha512 checksum must be unique." > > ] > > } > > This forced me to do something like: > > ... > > sha256=$(sha256sum "$targz" | awk '{print $1}') > > ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s > ha256=$sha256 > > | jq -r '.results[0]._href') > > if [[ $ARTIFACT_HREF == "null" ]]; then > > echo uploading artifact $cookbook_name sha256: $sha256 > > http --form POST [5]http://localhost:8000/pulp/api > > /v3/artifacts/ file@$targz > > ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?s > > ha256=$sha256 | jq -r '.results[0]._href') > > ... > > Perhaps a "303 See Other" to the existing artifact would help > > here. > > > > Why not just do something like: > > http --form POST [6]http://localhost:8000/pulp/api/v3/artifacts/ > file@$ > > targz || true > > ARTIFACT_HREF=$(http :8000/pulp/api/v3/artifacts/?sha256=$sha256 | jq > > -r '.results[0]._href’) > > if [[ $ARTIFACT_HREF == “null” ]]; then exit 1; fi > > The error message could be more helpful though. It should probably > > contain the existing artifact’s href. I looked at 303 and am a bit > > ambivalent toward using it. > > Sure, 303 is not really clear-cut. Including the href of the existing > artifact should already help. > > Opened https://pulp.plan.io/issues/3681
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev