On Tue, Jun 27, 2017 at 3:31 PM, Michael Hrivnak <[email protected]> wrote:
> Could you re-summarize what problem would be solved by not having a > FileUpload model, and giving the Artifact model the ability to have partial > data and no Content foreign key? > > I understand the concern about where on the filesystem the data gets > written and how many times, but I'm not seeing how that's related to > whether we have a FileUpload model or not. Are we discussing two separate > issues? 1) filesystem locations and copy efficiency, and 2) API design? Or > is this discussion trying to connect them in a way I'm not seeing? > There were two concerns: 1) Filesystem location and copy efficiency 2) API design The first one has been addressed. Thank you for pointing out that a second write will be a move operation. However, I am still concerned about the complexity of the API. A relatively small file should not require an upload session to be uploaded. A single API call to the Artifacts API should be enough to upload a file and create an Artifact from it. In Pulp 3.1+ we can introduce the FileUpload model to support chunked uploads. At the same time we would extend the Artifact API to accept a FileUpload id for creating an Artifact. > On Tue, Jun 27, 2017 at 3:20 PM, Dennis Kliban <[email protected]> wrote: > >> On Tue, Jun 27, 2017 at 2:56 PM, Brian Bouterse <[email protected]> >> wrote: >> >>> Picking up from @jortel's observations... >>> >>> +1 to allowing Artifacts to have an optional FK. >>> >>> If we have an Artifacts endpoint then we can allow for the deleting of a >>> single artifact if it has no FK. I think we want to disallow the removal of >>> an Artifact that has a foreign key. Also filtering should allow a single >>> operation to clean up all unassociated artifacts by searching for FK=None >>> or similar. >>> >>> Yes, we will need to allow the single call delivering a file to also >>> specify the relative path, size, checksums etc. Since the POST body >>> contains binary data we either need to accept this data as GET style params >>> or use a multi-part MIME upload [0]. Note that this creation of an Artifact >>> does not change the repository contents and therefore can be handled >>> synchronously outside of the tasking system. >>> >>> +1 to the saving of an Artifact to perform validation >>> >>> [0]: https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html >>> >>> >> >>> -Brian >>> >> >> I also support this optional FK for Artifacts and validation on save. We >> should probably stick with accepting GET parameters for the MVP. Though >> multi-part MIME support would be good to consider for 3.1+. >> >> >>> >>> On Tue, Jun 27, 2017 at 2:44 PM, Dennis Kliban <[email protected]> >>> wrote: >>> >>>> On Tue, Jun 27, 2017 at 1:24 PM, Michael Hrivnak <[email protected]> >>>> wrote: >>>> >>>>> >>>>> On Tue, Jun 27, 2017 at 11:27 AM, Jeff Ortel <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> - The artifact FK to a content unit would need to become optional. >>>>>> >>>>>> - Need to add use cases for cleaning up artifacts not associated with >>>>>> a content unit. >>>>>> >>>>>> - The upload API would need additional information needed to create >>>>>> an artifact. Like relative path, size, >>>>>> checksums etc. >>>>>> >>>>>> - Since (I assume) you are proposing uploading/writing directly to >>>>>> artifact storage (not staging in a working >>>>>> dir), the flow would need to involve (optional) validation. If >>>>>> validation fails, the artifact must not be >>>>>> inserted into the DB. >>>>> >>>>> >>>>> Perhaps a decent middle ground would be to stick with the plan of >>>>> keeping uploaded (or partially uploaded) files as a separate model until >>>>> they are ready to be turned into a Content instance plus artifacts, and >>>>> save their file data directly to somewhere within /var/lib/pulp/. It would >>>>> be some path distinct from where Artifacts are stored. That's what I had >>>>> imagined we would do anyway. Then as Dennis pointed out, turning that into >>>>> an Artifact would only require a move operation on the same filesystem, >>>>> which is super-cheap. >>>>> >>>>> >>>> Would that address all the concerns? We'd write the data just once, and >>>>> then move it once on the same filesystem. I haven't looked at django's >>>>> support for this recently, but it seems like it should be doable. >>>>> >>>>> I was just looking at the dropbox API and noticed that they provide >>>> two separate API endpoints for regular file uploads[0] (< 150mb) and large >>>> file uploads[1]. It is the latter that supports chunking and requires using >>>> an upload id. For the most common case they support uploading a file with >>>> one API call. Our original proposal requires 2 for the same use case. Pulp >>>> API users would appreciate having to only make one API call to upload a >>>> file. >>>> >>>> [0] https://www.dropbox.com/developers-v1/core/docs#files_put >>>> [1] https://www.dropbox.com/developers-v1/core/docs#chunked-upload >>>> >>>> >>>> >>>>> -- >>>>> >>>>> Michael Hrivnak >>>>> >>>>> Principal Software Engineer, RHCE >>>>> >>>>> Red Hat >>>>> >>>>> _______________________________________________ >>>>> Pulp-dev mailing list >>>>> [email protected] >>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> [email protected] >>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>>> >>> >> > > > -- > > Michael Hrivnak > > Principal Software Engineer, RHCE > > Red Hat >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
