I have been doing some thinking about pulp3 publishing with the following goals in mind:
- Eliminate symlinks.
- Eliminate need for each plugin to have its own Apache conf.
- Prevent orphaned content that is still published from being deleted.
The main concept is to store the relationship between an artifact and a URL in
the DB instead of using the
filesystem. A `Publication` is created (and owned) by a publisher. Each
`Publication` is composed of (linked
to) many `artifacts`. The linkage contains the path component of the URL which
is used to locate the artifact
referenced by a URL.
This covers artifacts as we know them today. But what about files generated
during publishing. A.K.A.
metadata? I propose that these files be stored as artifacts as well. This
requires an `Artifact` to be
redefined slightly. The definition would read more like:
"A file associated with either stored or published content".
Or, it would be even more generic, like:
"A file contained within the pulp inventory that may be associated with a
content (unit) or publication."
In any case, the relationship to a content (unit) becomes optional.
Publications are not user facing. I think we can keep this as an internal core
concept. At least for the MVP.
The /var/lib/pulp/published directory goes away.
General Flows:
Publishing: "The publisher will compose a publication"
1. Publisher creates a publication using the plugin API.
2. Publisher adds content artifacts to the publication.
3. Publisher generates some metadata files in the working dir.
4. Publisher adds the metadata files to the publication using the plugin API.
The artifacts can likely be
created behind the scenes by the plugin API.
5. Publisher commits (publishes) the publication. The plugin API ensures this
is atomic.
Client makes a GET request for content (or metadata):
1. Request is routed to the content (WSGI) application (just like in pulp2 for
RPM).
2. Query the `LinkedArtifact` table by URL path component to get the artifact.
3. forward the artifact storage path to:
<not stored locally>
streamer
<stored locally>
x-send
4. Done.
Tables:
=============================
Publication
id [PK]
publisher_id [FK]
created
schemes
LinkedArtifact
id [PK]
publication_id [FK]
artifact_id [FK]
URL
Examples Data:
==============================
Publisher:
----------------
publisher-1, ...
Artifact:
----------------
artifact-1, /var/lib/pulp/artifact/ff/9f373839d0/manifest
artifact-2, /var/lib/pulp/artifact/b1/37b64a8c83/tiger.img
Publication:
----------------
publication-1, publisher-1, 6-1-2017,..
LinkedArtifact:
----------------
<id>, publication-1, artifact-1, /pulp/published/http/zoo/md/manifest
<id>, publication-1, artifact-2, /pulp/published/http/zoo/images/tiger.img
URLs would be: /pulp/published/(http|https)/<path>
I think the core can have a single Apache configuration that defines 2
directories. One HTTPS protected by
SSL/entitlement and the other is plain HTTP.
Thoughts/Comments?
-jeff
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
