From: Jeff Hostetler
First draft of design document for partial clone feature.
Signed-off-by: Jeff Hostetler
---
Documentation/technical/partial-clone.txt | 259 ++
1 file changed, 259 insertions(+)
create mode 100644 Documentation/technical/partial-clone.txt
diff --git a/Documentation/technical/partial-clone.txt
b/Documentation/technical/partial-clone.txt
new file mode 100644
index 000..731bd8c
--- /dev/null
+++ b/Documentation/technical/partial-clone.txt
@@ -0,0 +1,259 @@
+Partial Clone Design Notes
+==
+
+The "Partial Clone" feature is a performance optimization for git that
+allows git to function without having a complete copy of the repository.
+
+During clone and fetch operations, git normally downloads the complete
+contents and history of the repository. That is, during clone the client
+receives all of the commits, trees, and blobs in the repository into a
+local ODB. Subsequent fetches extend the local ODB with any new objects.
+For large repositories, this can take significant time to download and
+large amounts of diskspace to store.
+
+The goal of this work is to allow git better handle extremely large
+repositories. Often in these repositories there are many files that the
+user does not need such as ancient versions of source files, files in
+portions of the worktree outside of the user's work area, or large binary
+assets. If we can avoid downloading such unneeded objects *in advance*
+during clone and fetch operations, we can decrease download times and
+reduce ODB disk usage.
+
+
+Non-Goals
+-
+
+Partial clone is a mechanism to limit the number of blobs and trees downloaded
+*within* a given range of commits -- and is therefore independent of and not
+intended to conflict with existing DAG-level mechanisms to limit the set of
+requested commits (i.e. shallow clone, single branch, or fetch '').
+
+
+Design Overview
+---
+
+Partial clone logically consists of the following parts:
+
+- A mechanism for the client to describe unneeded or unwanted objects to
+ the server.
+
+- A mechanism for the server to omit such unwanted objects from packfiles
+ sent to the client.
+
+- A mechanism for the client to gracefully handle missing objects (that
+ were previously omitted by the server).
+
+- A mechanism for the client to backfill missing objects as needed.
+
+
+Design Details
+--
+
+- A new pack-protocol capability "filter" is added to the fetch-pack and
+ upload-pack negotiation.
+
+ This uses the existing capability discovery mechanism.
+ See "filter" in Documentation/technical/pack-protocol.txt.
+
+- Clients pass a "filter-spec" to clone and fetch which is passed to the
+ server to request filtering during packfile construction.
+
+ There are various filters available to accomodate different situations.
+ See "--filter=" in Documentation/rev-list-options.txt.
+
+- On the server pack-objects applies the requested filter-spec as it
+ creates "filtered" packfiles for the client.
+
+ These filtered packfiles are incomplete in the traditional sense because
+ they may contain trees that reference blobs that the client does not have.
+
+- On the client these incomplete packfiles are marked as "promisor pacfiles"
+ and treated differently by various commands.
+
+- On the client a repository extension is added to the local config to
+ prevent older versions of git from failing mid-operation because of
+ missing objects that they cannot handle.
+ See "extensions.partialClone" in
Documentation/technical/repository-version.txt"
+
+
+Handling Missing Objects
+
+
+- An object may be missing due to a partial clone or fetch, or missing due
+ to repository corruption. To differentiate these cases, the local
+ repository specially indicates packfiles obtained from the promisor
+ remote.
+
+ These "promisor packfiles" consist of a ".promisor" file with
+ arbitrary contents (like the ".keep" files), in addition to
+ their ".pack" and ".idx" files.
+
+ In the future, this ability may be extended to loose objects in case
+ a promisor packfile is accidentally unpacked.
+
+- The local repository considers a "promisor object" to be an object that
+ it knows (to the best of its ability) that the promisor remote has, either
+ because the local repository has that object in one of its promisor
+ packfiles, or because another promisor object refers to it.
+
+ When git encounters a missing object, Git can see if it a promisor object
+ and handle it appropriately. If not, Git can report a corruption.
+
+ This means that there is no need for the client to explicitly maintain an
+ expensive-to-modify list of missing objects.
+
+- Since almost all Git code currently expects any referenced object to be
+ present locally and because we do not want to force every command to do
+ a dry-run first, a fallback mechanism is added