Perhaps I missed something .. when did the "random path" get proposed? In pulp2 and in pulp3 (currently), the artifact's path is deterministic.
MEDIA_ROOT/content/units/<type>/digest[0:2]/digest[2:]/<relative-path> where The digest is the sha256 hex-digest of the content unit's natural key. For example, the natural key for RPM content is the NEVREA and checksum. On 06/29/2017 03:51 PM, Brian Bouterse wrote: > There is really one practical issue that is driving this convo (I think): > Django's file upload handling wants > to save a file when we receive it. We also don't want to be moving around > files. Therefore we must save the > file in the right place on the first save(). > > So given ^, the question reduces to: "Where do we want to save a file that > backs an Artifact?" We can do that > one of two ways: randomly or orderly. Randomly would be inventing a uuid for > each file and having that make > the path to the file unique. An orderly way of doing it would be to have an > digest be used instead of a uuid. > Here are some path examples: > > random_path_example (random uuid): MEDIA_ROOT/artifact/uuid[0:2]/uuid[2:] > orderly_path_example (sha256 is the binary's digest): > MEDIA_ROOT/artifact/digest[0:2]/digest[2:] > > Random assignment is straightforward, and it also allows one Artifact to > serve exactly one content unit > allowing CASCADE delete's to handle cleanup easily. The problem with random > assignment is that it prevents an > important down-the-road use case: "as a user who has a file backup but not a > database backup, I can recover > my data without having to re-download all of my content from remotes". > Specifically, if Artifact's paths are > randomly chosen at upload time then if someone hands you a disk of Artifacts > and asks you to sync EPEL, there > is no way Pulp can reasonably recognize content it has on disk as already > existing there. > > This is where content addressable storage comes in. If the remoteArtifact has > the sha256 hash value set from > the remote metadata that was fetched, Pulp's changesets could recognize data > on disk as already downloaded. A > random layout can never do that. A tertiary outcome of using Content > Addressable Store is that now each file > backing an Artifact can only be stored on the filesystem. I say "tertiary > outcome" and not "downside" because > even though it's harder for us to implement, users would definitely see it as > a benefit that Pulp can't > duplicate content at an Architectural level. > > Please send thoughts/ideas. > > -Brian
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
