Hi, I am investigating how I can implement snapshotting support for Pulp repos (not with a plugin, at least not for now, but as a client).
Essentially, I need to make a copy of the pulp repo after each "logical write" operation (a set of updates/copies from other repos, and before the publish action). One way I've thought about it: * before calling publish on repo-1, create a repo-1__<timestamp> repo - here timestamp is down to millisecond or less, so the chance of a clash with another snapshot operation are slim * copy everything from repo-1 onto repo-1__<timestamp> * read the previous timestamp from the repo's notes section (if present) * if present, compare the contents of the previous timestamp repo with the current one. If they are the same, delete the newly created repo-1__<timestamp>, and do nothing else * if not present, or if the contents are different, write repo-1__<timestamp> in the repo's notes section * periodically clean up older repo-1__<timestamp> repos There are clearly major concurrency issues/race conditions here. * What happens if the contents of repo-1 change while I am performing the comparison? Nothing bad in this case, that's the reason I chose to do the copy first (in order to avoid comparing the repo with the previous timestamped copy) * What happens if another "snapshot" operation happens while I am doing those calculations? If I could guarantee that the updates to the notes section happen in the same order, nothing; I may end up having two timestamped copies that are identical, generated very shortly one after the other. If snapshot 1 starts, snapshot 2 starts and updates repo 1, then snapshot 1 updates repo 1, I just overwrote a newer snapshot. If pulp had ETag support and the PUT operation to update the notes had conditionals (like If-None-Match), then I'd be able to detect that case. Has anyone tried this kind of thing ever? It is, in a sense, like git/scm - each commit gets its own changeset id, and HEAD is always updated to point to the latest changeset. This can probably be implemented much more efficiently as a distributor, that doesn't create any distributable content, but only snapshots the state of the pulp repo. Maybe such a thing already exists? Thank you! Mihai
_______________________________________________ Pulp-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-list
