On Wed, Jun 28, 2017 at 12:44 PM, Brian Bouterse <[email protected]> wrote:
> For a file to be received and saved in the right place once, we need the > view saving the file to have all the info to form the complete path. After > talking w/ @jortel, I think we should store Artifacts at the following path: > > MEDIA_ROOT/content/units/digest[0:2]/digest[2:]/<rel_path> > > Note that digest is the Artifact's sha256 digest. This is different from > pulp2 which used the digest of the content unit. Note that <rel_path> would > be provided by the user along with <size> and/or <checksum_digest>. > > Note that this will cause an Artifact to live in exactly one place which > means Artifacts are now unique by digest and would need to be able to be > associated with multiple content units. I'm not sure why we didn't do this > before, so I'm interested in exploring issues associated with this. > If my memory serves me correctly we wanted to be able to have multiple copies of an Artifact when that Artifact can be a Content Unit by itself and also be one part of a unit. E.g.: an RPM that belong to a distribution. I am not sure what benefit we would derive from this, but I was hoping to jog someone's memory. > It would be a good workflow. For a single file content unit (e.g.) rpm > upload would be a two step process. > > 1. POST/PUT the file's binary data and the <relative_path> and <size> > and/or <checksum_digest> as GET parameters > 2. Create a content unit with the unit metadata, and 0 .. n Artifacts > referred to by ID. This could optionally associate the new unit with one > repository as part of the atomic unit creation. > > Thoughts/Ideas? > > If we provide an option to combine content unit creation with repo association, this option should allow specifying multiple repositories. Though for the MVP, I think we should support neither. Uploading a content unit to a particular repository would involve 3 steps. 1. POST to Artifact API endpoint with <relative_path> and <size> and/or <checksum_digest> as GET parameters 2. POST to Content Unit API endpoint with the unit metadata, and 0 .. n Artifacts referred to by ID. 3. POST to the Repository Content Unit API endpoint to associate the unit with the repository. Step 3 would be repeated for each repository the content unit should belong to. > -Brian > > > On Tue, Jun 27, 2017 at 4:16 PM, Dennis Kliban <[email protected]> wrote: > >> On Tue, Jun 27, 2017 at 3:31 PM, Michael Hrivnak <[email protected]> >> wrote: >> >>> Could you re-summarize what problem would be solved by not having a >>> FileUpload model, and giving the Artifact model the ability to have partial >>> data and no Content foreign key? >>> >>> I understand the concern about where on the filesystem the data gets >>> written and how many times, but I'm not seeing how that's related to >>> whether we have a FileUpload model or not. Are we discussing two separate >>> issues? 1) filesystem locations and copy efficiency, and 2) API design? Or >>> is this discussion trying to connect them in a way I'm not seeing? >>> >> >> There were two concerns: 1) Filesystem location and copy efficiency 2) >> API design >> >> The first one has been addressed. Thank you for pointing out that a >> second write will be a move operation. >> >> However, I am still concerned about the complexity of the API. A >> relatively small file should not require an upload session to be uploaded. >> A single API call to the Artifacts API should be enough to upload a file >> and create an Artifact from it. In Pulp 3.1+ we can introduce the >> FileUpload model to support chunked uploads. At the same time we would >> extend the Artifact API to accept a FileUpload id for creating an Artifact. >> >> >>> On Tue, Jun 27, 2017 at 3:20 PM, Dennis Kliban <[email protected]> >>> wrote: >>> >>>> On Tue, Jun 27, 2017 at 2:56 PM, Brian Bouterse <[email protected]> >>>> wrote: >>>> >>>>> Picking up from @jortel's observations... >>>>> >>>>> +1 to allowing Artifacts to have an optional FK. >>>>> >>>>> If we have an Artifacts endpoint then we can allow for the deleting of >>>>> a single artifact if it has no FK. I think we want to disallow the removal >>>>> of an Artifact that has a foreign key. Also filtering should allow a >>>>> single >>>>> operation to clean up all unassociated artifacts by searching for FK=None >>>>> or similar. >>>>> >>>>> Yes, we will need to allow the single call delivering a file to also >>>>> specify the relative path, size, checksums etc. Since the POST body >>>>> contains binary data we either need to accept this data as GET style >>>>> params >>>>> or use a multi-part MIME upload [0]. Note that this creation of an >>>>> Artifact >>>>> does not change the repository contents and therefore can be handled >>>>> synchronously outside of the tasking system. >>>>> >>>>> +1 to the saving of an Artifact to perform validation >>>>> >>>>> [0]: https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html >>>>> >>>>> >>>> >>>>> -Brian >>>>> >>>> >>>> I also support this optional FK for Artifacts and validation on save. >>>> We should probably stick with accepting GET parameters for the MVP. Though >>>> multi-part MIME support would be good to consider for 3.1+. >>>> >>>> >>>>> >>>>> On Tue, Jun 27, 2017 at 2:44 PM, Dennis Kliban <[email protected]> >>>>> wrote: >>>>> >>>>>> On Tue, Jun 27, 2017 at 1:24 PM, Michael Hrivnak <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> >>>>>>> On Tue, Jun 27, 2017 at 11:27 AM, Jeff Ortel <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> - The artifact FK to a content unit would need to become optional. >>>>>>>> >>>>>>>> - Need to add use cases for cleaning up artifacts not associated >>>>>>>> with a content unit. >>>>>>>> >>>>>>>> - The upload API would need additional information needed to create >>>>>>>> an artifact. Like relative path, size, >>>>>>>> checksums etc. >>>>>>>> >>>>>>>> - Since (I assume) you are proposing uploading/writing directly to >>>>>>>> artifact storage (not staging in a working >>>>>>>> dir), the flow would need to involve (optional) validation. If >>>>>>>> validation fails, the artifact must not be >>>>>>>> inserted into the DB. >>>>>>> >>>>>>> >>>>>>> Perhaps a decent middle ground would be to stick with the plan of >>>>>>> keeping uploaded (or partially uploaded) files as a separate model until >>>>>>> they are ready to be turned into a Content instance plus artifacts, and >>>>>>> save their file data directly to somewhere within /var/lib/pulp/. It >>>>>>> would >>>>>>> be some path distinct from where Artifacts are stored. That's what I had >>>>>>> imagined we would do anyway. Then as Dennis pointed out, turning that >>>>>>> into >>>>>>> an Artifact would only require a move operation on the same filesystem, >>>>>>> which is super-cheap. >>>>>>> >>>>>>> >>>>>> Would that address all the concerns? We'd write the data just once, >>>>>>> and then move it once on the same filesystem. I haven't looked at >>>>>>> django's >>>>>>> support for this recently, but it seems like it should be doable. >>>>>>> >>>>>>> I was just looking at the dropbox API and noticed that they provide >>>>>> two separate API endpoints for regular file uploads[0] (< 150mb) and >>>>>> large >>>>>> file uploads[1]. It is the latter that supports chunking and requires >>>>>> using >>>>>> an upload id. For the most common case they support uploading a file with >>>>>> one API call. Our original proposal requires 2 for the same use case. >>>>>> Pulp >>>>>> API users would appreciate having to only make one API call to upload a >>>>>> file. >>>>>> >>>>>> [0] https://www.dropbox.com/developers-v1/core/docs#files_put >>>>>> [1] https://www.dropbox.com/developers-v1/core/docs#chunked-upload >>>>>> >>>>>> >>>>>> >>>>>>> -- >>>>>>> >>>>>>> Michael Hrivnak >>>>>>> >>>>>>> Principal Software Engineer, RHCE >>>>>>> >>>>>>> Red Hat >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pulp-dev mailing list >>>>>>> [email protected] >>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> [email protected] >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> >>> Michael Hrivnak >>> >>> Principal Software Engineer, RHCE >>> >>> Red Hat >>> >> >> >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
