Perhaps I missed something .. when did the "random path" get proposed?  In 
pulp2 and in pulp3 (currently), the
artifact's path is deterministic.

MEDIA_ROOT/content/units/<type>/digest[0:2]/digest[2:]/<relative-path>

where The digest is the sha256 hex-digest of the content unit's natural key.  
For example, the natural key for
RPM content is the NEVREA and checksum.

On 06/29/2017 03:51 PM, Brian Bouterse wrote:
> There is really one practical issue that is driving this convo (I think):  
> Django's file upload handling wants
> to save a file when we receive it. We also don't want to be moving around 
> files. Therefore we must save the
> file in the right place on the first save().
> 
> So given ^, the question reduces to: "Where do we want to save a file that 
> backs an Artifact?" We can do that
> one of two ways: randomly or orderly. Randomly would be inventing a uuid for 
> each file and having that make
> the path to the file unique. An orderly way of doing it would be to have an 
> digest be used instead of a uuid.
> Here are some path examples:
> 
> random_path_example (random uuid):    MEDIA_ROOT/artifact/uuid[0:2]/uuid[2:]
> orderly_path_example (sha256 is the binary's digest):    
> MEDIA_ROOT/artifact/digest[0:2]/digest[2:]
> 
> Random assignment is straightforward, and it also allows one Artifact to 
> serve exactly one content unit
> allowing CASCADE delete's to handle cleanup easily. The problem with random 
> assignment is that it prevents an
> important down-the-road use case:  "as a user who has a file backup but not a 
> database backup, I can recover
> my data without having to re-download all of my content from remotes". 
> Specifically, if Artifact's paths are
> randomly chosen at upload time then if someone hands you a disk of Artifacts 
> and asks you to sync EPEL, there
> is no way Pulp can reasonably recognize content it has on disk as already 
> existing there.
> 
> This is where content addressable storage comes in. If the remoteArtifact has 
> the sha256 hash value set from
> the remote metadata that was fetched, Pulp's changesets could recognize data 
> on disk as already downloaded. A
> random layout can never do that. A tertiary outcome of using Content 
> Addressable Store is that now each file
> backing an Artifact can only be stored on the filesystem. I say "tertiary 
> outcome" and not "downside" because
> even though it's harder for us to implement, users would definitely see it as 
> a benefit that Pulp can't
> duplicate content at an Architectural level.
> 
> Please send thoughts/ideas.
> 
> -Brian

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Reply via email to