Re: [Pulp-dev] proposing changes to pulp 3 upload API

Jeff Ortel Wed, 28 Jun 2017 10:11:21 -0700


On 06/28/2017 11:44 AM, Brian Bouterse wrote:
> For a file to be received and saved in the right place once, we need the view 
> saving the file to have all the
> info to form the complete path. After talking w/ @jortel, I think we should 
> store Artifacts at the following path:
> 
> MEDIA_ROOT/content/units/digest[0:2]/digest[2:]/<rel_path>


Consider:
MEDIA_ROOT/artifact/digest[0:2]/digest[2:]/<rel_path>

Since artifact would have an optional association with content.  And, given the 
many-to-many relationship, the
content_id FK would not longer exist in the Artifact table.  Also, I have more 
plans for Artifacts in a
"Publishing" proposal I'm writing to pulp-dev (spoiler alert).

We would also want to enforce the same CAS (content addressed storage) 
uniqueness in the DB using a unique
constraint on the Artifact.  Eg: unique (sha256, rel_path).  This ensure that 
each unique artifact (file) has
exactly 1 DB record.

> 
> Note that digest is the Artifact's sha256 digest. This is different from 
> pulp2 which used the digest of the
> content unit. Note that <rel_path> would be provided by the user along with 
> <size> and/or <checksum_digest>.
> 
> Note that this will cause an Artifact to live in exactly one place which 
> means Artifacts are now unique by
> digest and would need to be able to be associated with multiple content 
> units. I'm not sure why we didn't do
> this before, so I'm interested in exploring issues associated with this.
> 
> It would be a good workflow. For a single file content unit (e.g.) rpm upload 
> would be a two step process.
> 
> 1. POST/PUT the file's binary data and the <relative_path> and <size> and/or 
> <checksum_digest> as GET parameters
> 2. Create a content unit with the unit metadata, and 0 .. n Artifacts 
> referred to by ID. This could optionally
> associate the new unit with one repository as part of the atomic unit 
> creation.
> 
> Thoughts/Ideas?
> 
> -Brian
> 
> 
> On Tue, Jun 27, 2017 at 4:16 PM, Dennis Kliban <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On Tue, Jun 27, 2017 at 3:31 PM, Michael Hrivnak <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>         Could you re-summarize what problem would be solved by not having a 
> FileUpload model, and giving the
>         Artifact model the ability to have partial data and no Content 
> foreign key?
> 
>         I understand the concern about where on the filesystem the data gets 
> written and how many times, but
>         I'm not seeing how that's related to whether we have a FileUpload 
> model or not. Are we discussing two
>         separate issues? 1) filesystem locations and copy efficiency, and 2) 
> API design? Or is this discussion
>         trying to connect them in a way I'm not seeing?
> 
> 
>     There were two concerns: 1) Filesystem  location and copy efficiency 2) 
> API design
> 
>     The first one has been addressed. Thank you for pointing out that a 
> second write will be a move operation.
> 
>     However, I am still concerned about the complexity of the API. A 
> relatively small file should not require
>     an upload session to be uploaded. A single API call to the Artifacts API 
> should be enough to upload a file
>     and create an Artifact from it. In Pulp 3.1+ we can introduce the 
> FileUpload model to support chunked
>     uploads. At the same time we would extend the Artifact API to accept a 
> FileUpload id for creating an
>     Artifact.
> 
> 
>         On Tue, Jun 27, 2017 at 3:20 PM, Dennis Kliban <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>             On Tue, Jun 27, 2017 at 2:56 PM, Brian Bouterse 
> <[email protected] <mailto:[email protected]>>
>             wrote:
> 
>                 Picking up from @jortel's observations...
> 
>                 +1 to allowing Artifacts to have an optional FK.
> 
>                 If we have an Artifacts endpoint then we can allow for the 
> deleting of a single artifact if it
>                 has no FK. I think we want to disallow the removal of an 
> Artifact that has a foreign key. Also
>                 filtering should allow a single operation to clean up all 
> unassociated artifacts by searching
>                 for FK=None or similar.
> 
>                 Yes, we will need to allow the single call delivering a file 
> to also specify the relative
>                 path, size, checksums etc. Since the POST body contains 
> binary data we either need to accept
>                 this data as GET style params or use a multi-part MIME upload 
> [0]. Note that this creation of
>                 an Artifact does not change the repository contents and 
> therefore can be handled synchronously
>                 outside of the tasking system.
> 
>                 +1 to the saving of an Artifact to perform validation
> 
>                 [0]: https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
>                 <https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html>
>                  
> 
> 
>                 -Brian
> 
> 
>             I also support this optional FK for Artifacts and validation on 
> save.  We should probably stick
>             with accepting GET parameters for the MVP. Though multi-part MIME 
> support would be good to
>             consider for 3.1+.
>              
> 
> 
>                 On Tue, Jun 27, 2017 at 2:44 PM, Dennis Kliban 
> <[email protected]
>                 <mailto:[email protected]>> wrote:
> 
>                     On Tue, Jun 27, 2017 at 1:24 PM, Michael Hrivnak 
> <[email protected]
>                     <mailto:[email protected]>> wrote:
> 
> 
>                         On Tue, Jun 27, 2017 at 11:27 AM, Jeff Ortel 
> <[email protected]
>                         <mailto:[email protected]>> wrote:
> 
> 
>                             - The artifact FK to a content unit would need to 
> become optional.
> 
>                             - Need to add use cases for cleaning up artifacts 
> not associated with a content unit.
> 
>                             - The upload API would need additional 
> information needed to create an artifact. 
>                             Like relative path, size,
>                             checksums etc.
> 
>                             - Since (I assume) you are proposing 
> uploading/writing directly to artifact
>                             storage (not staging in a working
>                             dir), the flow would need to involve (optional) 
> validation.  If validation fails,
>                             the artifact must not be
>                             inserted into the DB.
> 
> 
>                         Perhaps a decent middle ground would be to stick with 
> the plan of keeping uploaded (or
>                         partially uploaded) files as a separate model until 
> they are ready to be turned into a
>                         Content instance plus artifacts, and save their file 
> data directly to somewhere within
>                         /var/lib/pulp/. It would be some path distinct from 
> where Artifacts are stored. That's
>                         what I had imagined we would do anyway. Then as 
> Dennis pointed out, turning that into
>                         an Artifact would only require a move operation on 
> the same filesystem, which is
>                         super-cheap.
>                          
> 
>                         Would that address all the concerns? We'd write the 
> data just once, and then move it
>                         once on the same filesystem. I haven't looked at 
> django's support for this recently,
>                         but it seems like it should be doable.
> 
>                     I was just looking at the dropbox API and noticed that 
> they provide two separate API
>                     endpoints for regular file uploads[0] (< 150mb) and large 
> file uploads[1]. It is the
>                     latter that supports chunking and requires using an 
> upload id. For the most common case
>                     they support uploading a file with one API call. Our 
> original proposal requires 2 for the
>                     same use case. Pulp API users would appreciate having to 
> only make one API call to upload
>                     a file. 
> 
>                     [0] 
> https://www.dropbox.com/developers-v1/core/docs#files_put
>                     
> <https://www.dropbox.com/developers-v1/core/docs#files_put>
>                     [1] 
> https://www.dropbox.com/developers-v1/core/docs#chunked-upload
>                     
> <https://www.dropbox.com/developers-v1/core/docs#chunked-upload>
> 
>                      
> 
>                         -- 
> 
>                         Michael Hrivnak
> 
>                         Principal Software Engineer, RHCE 
> 
>                         Red Hat
> 
> 
>                         _______________________________________________
>                         Pulp-dev mailing list
>                         [email protected] <mailto:[email protected]>
>                         https://www.redhat.com/mailman/listinfo/pulp-dev
>                         <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
>                     _______________________________________________
>                     Pulp-dev mailing list
>                     [email protected] <mailto:[email protected]>
>                     https://www.redhat.com/mailman/listinfo/pulp-dev
>                     <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
> 
> 
> 
>         -- 
> 
>         Michael Hrivnak
> 
>         Principal Software Engineer, RHCE 
> 
>         Red Hat
> 
> 
> 
> 
> 
> _______________________________________________
> Pulp-dev mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/pulp-dev
>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] proposing changes to pulp 3 upload API

Reply via email to