On Wed, Jun 28, 2017 at 1:10 PM, Jeff Ortel <[email protected]> wrote:
> > > On 06/28/2017 11:44 AM, Brian Bouterse wrote: > > For a file to be received and saved in the right place once, we need the > view saving the file to have all the > > info to form the complete path. After talking w/ @jortel, I think we > should store Artifacts at the following path: > > > > MEDIA_ROOT/content/units/digest[0:2]/digest[2:]/<rel_path> > > Consider: > MEDIA_ROOT/artifact/digest[0:2]/digest[2:]/<rel_path> > > Since artifact would have an optional association with content. And, > given the many-to-many relationship, the > content_id FK would not longer exist in the Artifact table. Also, I have > more plans for Artifacts in a > "Publishing" proposal I'm writing to pulp-dev (spoiler alert). > > We would also want to enforce the same CAS (content addressed storage) > uniqueness in the DB using a unique > constraint on the Artifact. Eg: unique (sha256, rel_path). This ensure > that each unique artifact (file) has > exactly 1 DB record. > > I don't think it makes sense for an Artifact to have a rel_path. It is just a file. A ContentUnit should have the rel_path that will be used at publish time to make the file backing the Artifact available at that rel_path. Is my understanding of the rel_path correct? In that case the only thing that should be unique is the sha256 digest. I've written a story that outlines this use case: https://pulp.plan.io/issues/2843 > > > > Note that digest is the Artifact's sha256 digest. This is different from > pulp2 which used the digest of the > > content unit. Note that <rel_path> would be provided by the user along > with <size> and/or <checksum_digest>. > > > > Note that this will cause an Artifact to live in exactly one place which > means Artifacts are now unique by > > digest and would need to be able to be associated with multiple content > units. I'm not sure why we didn't do > > this before, so I'm interested in exploring issues associated with this. > > > > It would be a good workflow. For a single file content unit (e.g.) rpm > upload would be a two step process. > > > > 1. POST/PUT the file's binary data and the <relative_path> and <size> > and/or <checksum_digest> as GET parameters > > 2. Create a content unit with the unit metadata, and 0 .. n Artifacts > referred to by ID. This could optionally > > associate the new unit with one repository as part of the atomic unit > creation. > > > > Thoughts/Ideas? > > > > -Brian > > > > > > On Tue, Jun 27, 2017 at 4:16 PM, Dennis Kliban <[email protected] > <mailto:[email protected]>> wrote: > > > > On Tue, Jun 27, 2017 at 3:31 PM, Michael Hrivnak < > [email protected] <mailto:[email protected]>> wrote: > > > > Could you re-summarize what problem would be solved by not > having a FileUpload model, and giving the > > Artifact model the ability to have partial data and no Content > foreign key? > > > > I understand the concern about where on the filesystem the data > gets written and how many times, but > > I'm not seeing how that's related to whether we have a > FileUpload model or not. Are we discussing two > > separate issues? 1) filesystem locations and copy efficiency, > and 2) API design? Or is this discussion > > trying to connect them in a way I'm not seeing? > > > > > > There were two concerns: 1) Filesystem location and copy efficiency > 2) API design > > > > The first one has been addressed. Thank you for pointing out that a > second write will be a move operation. > > > > However, I am still concerned about the complexity of the API. A > relatively small file should not require > > an upload session to be uploaded. A single API call to the Artifacts > API should be enough to upload a file > > and create an Artifact from it. In Pulp 3.1+ we can introduce the > FileUpload model to support chunked > > uploads. At the same time we would extend the Artifact API to accept > a FileUpload id for creating an > > Artifact. > > > > > > On Tue, Jun 27, 2017 at 3:20 PM, Dennis Kliban < > [email protected] <mailto:[email protected]>> wrote: > > > > On Tue, Jun 27, 2017 at 2:56 PM, Brian Bouterse < > [email protected] <mailto:[email protected]>> > > wrote: > > > > Picking up from @jortel's observations... > > > > +1 to allowing Artifacts to have an optional FK. > > > > If we have an Artifacts endpoint then we can allow for > the deleting of a single artifact if it > > has no FK. I think we want to disallow the removal of an > Artifact that has a foreign key. Also > > filtering should allow a single operation to clean up > all unassociated artifacts by searching > > for FK=None or similar. > > > > Yes, we will need to allow the single call delivering a > file to also specify the relative > > path, size, checksums etc. Since the POST body contains > binary data we either need to accept > > this data as GET style params or use a multi-part MIME > upload [0]. Note that this creation of > > an Artifact does not change the repository contents and > therefore can be handled synchronously > > outside of the tasking system. > > > > +1 to the saving of an Artifact to perform validation > > > > [0]: https://www.w3.org/Protocols/ > rfc1341/7_2_Multipart.html > > <https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html > > > > > > > > > > -Brian > > > > > > I also support this optional FK for Artifacts and validation > on save. We should probably stick > > with accepting GET parameters for the MVP. Though multi-part > MIME support would be good to > > consider for 3.1+. > > > > > > > > On Tue, Jun 27, 2017 at 2:44 PM, Dennis Kliban < > [email protected] > > <mailto:[email protected]>> wrote: > > > > On Tue, Jun 27, 2017 at 1:24 PM, Michael Hrivnak < > [email protected] > > <mailto:[email protected]>> wrote: > > > > > > On Tue, Jun 27, 2017 at 11:27 AM, Jeff Ortel < > [email protected] > > <mailto:[email protected]>> wrote: > > > > > > - The artifact FK to a content unit would > need to become optional. > > > > - Need to add use cases for cleaning up > artifacts not associated with a content unit. > > > > - The upload API would need additional > information needed to create an artifact. > > Like relative path, size, > > checksums etc. > > > > - Since (I assume) you are proposing > uploading/writing directly to artifact > > storage (not staging in a working > > dir), the flow would need to involve > (optional) validation. If validation fails, > > the artifact must not be > > inserted into the DB. > > > > > > Perhaps a decent middle ground would be to stick > with the plan of keeping uploaded (or > > partially uploaded) files as a separate model > until they are ready to be turned into a > > Content instance plus artifacts, and save their > file data directly to somewhere within > > /var/lib/pulp/. It would be some path distinct > from where Artifacts are stored. That's > > what I had imagined we would do anyway. Then as > Dennis pointed out, turning that into > > an Artifact would only require a move operation > on the same filesystem, which is > > super-cheap. > > > > > > Would that address all the concerns? We'd write > the data just once, and then move it > > once on the same filesystem. I haven't looked at > django's support for this recently, > > but it seems like it should be doable. > > > > I was just looking at the dropbox API and noticed > that they provide two separate API > > endpoints for regular file uploads[0] (< 150mb) and > large file uploads[1]. It is the > > latter that supports chunking and requires using an > upload id. For the most common case > > they support uploading a file with one API call. Our > original proposal requires 2 for the > > same use case. Pulp API users would appreciate > having to only make one API call to upload > > a file. > > > > [0] https://www.dropbox.com/ > developers-v1/core/docs#files_put > > <https://www.dropbox.com/ > developers-v1/core/docs#files_put> > > [1] https://www.dropbox.com/developers-v1/core/docs# > chunked-upload > > <https://www.dropbox.com/developers-v1/core/docs# > chunked-upload> > > > > > > > > -- > > > > Michael Hrivnak > > > > Principal Software Engineer, RHCE > > > > Red Hat > > > > > > _______________________________________________ > > Pulp-dev mailing list > > [email protected] <mailto:[email protected]> > > https://www.redhat.com/mailman/listinfo/pulp-dev > > <https://www.redhat.com/ > mailman/listinfo/pulp-dev> > > > > > > > > _______________________________________________ > > Pulp-dev mailing list > > [email protected] <mailto:[email protected]> > > https://www.redhat.com/mailman/listinfo/pulp-dev > > <https://www.redhat.com/mailman/listinfo/pulp-dev> > > > > > > > > > > > > > > -- > > > > Michael Hrivnak > > > > Principal Software Engineer, RHCE > > > > Red Hat > > > > > > > > > > > > _______________________________________________ > > Pulp-dev mailing list > > [email protected] > > https://www.redhat.com/mailman/listinfo/pulp-dev > > > > > _______________________________________________ > Pulp-dev mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/pulp-dev > >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
