Re: [PATCH v7 21/22] gc: automatically write commit-graph files
Hi Derrick, > The commit-graph file is a very helpful feature for speeding up git > operations. In order to make it more useful, make it possible to > write the commit-graph file during standard garbage collection > operations. > > Add a 'gc.commitGraph' config setting that triggers writing a > commit-graph file after any non-trivial 'git gc' command. Defaults to > false while the commit-graph feature matures. We specifically do not > want to have this on by default until the commit-graph feature is fully > integrated with history-modifying features like shallow clones. > > Helped-by: Ævar Arnfjörð Bjarmason > Signed-off-by: Derrick Stolee > --- > Documentation/config.txt | 9 ++--- > Documentation/git-gc.txt | 4 > builtin/gc.c | 6 ++ > t/t5318-commit-graph.sh | 14 ++ > 4 files changed, 30 insertions(+), 3 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 1cc18a828c..978deecfee 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -904,9 +904,12 @@ core.notesRef:: > This setting defaults to "refs/notes/commits", and it can be overridden by > the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1]. > > -core.commitGraph:: > - Enable git commit graph feature. Allows reading from the > - commit-graph file. > +gc.commitGraph:: > + If true, then gc will rewrite the commit-graph file when > + linkgit:git-gc[1] is run. When using linkgit:git-gc[1] > + '--auto' the commit-graph will be updated if housekeeping is > + required. Default is false. See linkgit:git-commit-graph[1] > + for details. Sorry, I've lost track of this patch series, unfortunately, but after it hit master yesterday a merge conflict with one of my (unrelated) WIP patches drew my attention to this hunk above, which got me confused: - Why remove the description of 'core.commitGraph'? 'git grep -i core.commitGraph' shows that it's still used supported. - Why insert the description of 'gc.commitGraph' here instead of next to the other 'gc.*' config variables, breaking lexicographical order of confvars? - This hunk, the other documentation hunk right below, and indeed the commit message all speak about 'gc.commitGraph', but the new code and test added in this patch use 'gc.writeCommitGraph' (following my suggestion from a couple of weeks ago, I think?). What's going on? > > core.sparseCheckout:: > Enable "sparse checkout" feature. See section "Sparse checkout" in > diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt > index 24b2dd44fe..f5bc98ccb3 100644 > --- a/Documentation/git-gc.txt > +++ b/Documentation/git-gc.txt > @@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` > determines if > it within all non-bare repos or it can be set to a boolean value. > This defaults to true. > > +The optional configuration variable `gc.commitGraph` determines if > +'git gc' should run 'git commit-graph write'. This can be set to a > +boolean value. This defaults to false. > + > The optional configuration variable `gc.aggressiveWindow` controls how > much time is spent optimizing the delta compression of the objects in > the repository when the --aggressive option is specified. The larger > diff --git a/builtin/gc.c b/builtin/gc.c > index ccfb1ceaeb..b7dd206f41 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -20,6 +20,7 @@ > #include "sigchain.h" > #include "argv-array.h" > #include "commit.h" > +#include "commit-graph.h" > #include "packfile.h" > #include "object-store.h" > #include "pack.h" > @@ -40,6 +41,7 @@ static int aggressive_depth = 50; > static int aggressive_window = 250; > static int gc_auto_threshold = 6700; > static int gc_auto_pack_limit = 50; > +static int gc_write_commit_graph = 0; > static int detach_auto = 1; > static timestamp_t gc_log_expire_time; > static const char *gc_log_expire = "1.day.ago"; > @@ -129,6 +131,7 @@ static void gc_config(void) > git_config_get_int("gc.aggressivedepth", &aggressive_depth); > git_config_get_int("gc.auto", &gc_auto_threshold); > git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit); > + git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph); > git_config_get_bool("gc.autodetach", &detach_auto); > git_config_get_expiry("gc.pruneexpire", &prune_expire); > git_config_get_expiry("gc.worktreepruneexpire", > &prune_worktrees_expire); > @@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char > *prefix) > if (pack_garbage.nr > 0) > clean_pack_garbage(); > > + if (gc_write_commit_graph) > + write_commit_graph_reachable(get_object_directory(), 0); > + > if (auto_gc && too_many_loose_objects()) > warning(_("There are too many unreachable loose objects; " > "run 'git prune' to remove them.")); > diff --git a/t/t5318-comm
Re: [PATCH v7 21/22] gc: automatically write commit-graph files
On 6/27/2018 2:09 PM, Junio C Hamano wrote: Derrick Stolee writes: @@ -40,6 +41,7 @@ static int aggressive_depth = 50; static int aggressive_window = 250; static int gc_auto_threshold = 6700; static int gc_auto_pack_limit = 50; +static int gc_write_commit_graph = 0; Please avoid unnecessary (and undesirable) explicit initialization to 0. Instead, let BSS to handle it by leaving " = 0" out. +test_expect_success 'check that gc computes commit-graph' ' + cd "$TRASH_DIRECTORY/full" && + git commit --allow-empty -m "blank" && + git commit-graph write --reachable && + cp $objdir/info/commit-graph commit-graph-before-gc && + git reset --hard HEAD~1 && + git config gc.writeCommitGraph true && + git gc && + cp $objdir/info/commit-graph commit-graph-after-gc && + ! test_cmp commit-graph-before-gc commit-graph-after-gc && The set of commits in the commit graph will chanbe by discarding the (old) tip commit, so the fact that the contents of the file changed across gc proves that "commit-graph write" was triggered during gc. Come to think of it, do we promise to end-users (in docs etc.) that commit-graph covers *ONLY* commits reachable from refs and HEAD? I am wondering what should happen if "git gc" here does not prune the reflog for HEAD---wouldn't we want to reuse the properties of the commit we are discarding when it comes back (e.g. you push, then reset back, and the next pull makes it reappear in your repository)? Today I learned that 'gc' keeps some of the reflog around. That makes sense, but I wouldn't optimize the commit-graph file for this scenario. I guess what I am really questioning is if it is sensible to define "--reachable" as "starting at all refs", unlike the usual connectivity rules "gc" uses, especially when this is run from inside "gc". It is sensible to me, especially because we only lose performance if we visit those other commits that are still in the object database. By writing the commit-graph on 'gc' and not during 'fetch', we are already assuming the commit-graph will usually be behind the set of commits that the user cares about, by design. An alternate view on the decision will need help answering from others who know more than me: In fetch negotiation, does the client report commits in the reflog as 'have's or do they get re-downloaded on a resulting 'git pull'? + git commit-graph write --reachable && + test_cmp commit-graph-after-gc $objdir/info/commit-graph This says that running "commit-graph write" twice without changing the topology MUST yield byte-for-byte identical commit-graph file. Is that a reasonable requirement on the future implementation? I am wondering if there will arise a situation where you need to store records in "some" fixed order but two records compare "the same" and tie-breaking them to give stable sort is expensive, or something like that, which would benefit if you leave an escape hatch to allow two logically identical graphs expressed bitwise differently. Since the file format allows flexibility in the order of the chunks, it is possible to have bitwise-different files that represent the same set of data. However, I would not want git to provide inconsistent output given the same set of commits covered by the file. Thanks, -Stolee
Re: [PATCH v7 21/22] gc: automatically write commit-graph files
Derrick Stolee writes: > @@ -40,6 +41,7 @@ static int aggressive_depth = 50; > static int aggressive_window = 250; > static int gc_auto_threshold = 6700; > static int gc_auto_pack_limit = 50; > +static int gc_write_commit_graph = 0; Please avoid unnecessary (and undesirable) explicit initialization to 0. Instead, let BSS to handle it by leaving " = 0" out. > +test_expect_success 'check that gc computes commit-graph' ' > + cd "$TRASH_DIRECTORY/full" && > + git commit --allow-empty -m "blank" && > + git commit-graph write --reachable && > + cp $objdir/info/commit-graph commit-graph-before-gc && > + git reset --hard HEAD~1 && > + git config gc.writeCommitGraph true && > + git gc && > + cp $objdir/info/commit-graph commit-graph-after-gc && > + ! test_cmp commit-graph-before-gc commit-graph-after-gc && The set of commits in the commit graph will chanbe by discarding the (old) tip commit, so the fact that the contents of the file changed across gc proves that "commit-graph write" was triggered during gc. Come to think of it, do we promise to end-users (in docs etc.) that commit-graph covers *ONLY* commits reachable from refs and HEAD? I am wondering what should happen if "git gc" here does not prune the reflog for HEAD---wouldn't we want to reuse the properties of the commit we are discarding when it comes back (e.g. you push, then reset back, and the next pull makes it reappear in your repository)? I guess what I am really questioning is if it is sensible to define "--reachable" as "starting at all refs", unlike the usual connectivity rules "gc" uses, especially when this is run from inside "gc". > + git commit-graph write --reachable && > + test_cmp commit-graph-after-gc $objdir/info/commit-graph This says that running "commit-graph write" twice without changing the topology MUST yield byte-for-byte identical commit-graph file. Is that a reasonable requirement on the future implementation? I am wondering if there will arise a situation where you need to store records in "some" fixed order but two records compare "the same" and tie-breaking them to give stable sort is expensive, or something like that, which would benefit if you leave an escape hatch to allow two logically identical graphs expressed bitwise differently. > +' > + > # the verify tests below expect the commit-graph to contain > # exactly the commits reachable from the commits/8 branch. > # If the file changes the set of commits in the list, then the
[PATCH v7 21/22] gc: automatically write commit-graph files
The commit-graph file is a very helpful feature for speeding up git operations. In order to make it more useful, make it possible to write the commit-graph file during standard garbage collection operations. Add a 'gc.commitGraph' config setting that triggers writing a commit-graph file after any non-trivial 'git gc' command. Defaults to false while the commit-graph feature matures. We specifically do not want to have this on by default until the commit-graph feature is fully integrated with history-modifying features like shallow clones. Helped-by: Ævar Arnfjörð Bjarmason Signed-off-by: Derrick Stolee --- Documentation/config.txt | 9 ++--- Documentation/git-gc.txt | 4 builtin/gc.c | 6 ++ t/t5318-commit-graph.sh | 14 ++ 4 files changed, 30 insertions(+), 3 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 1cc18a828c..978deecfee 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -904,9 +904,12 @@ core.notesRef:: This setting defaults to "refs/notes/commits", and it can be overridden by the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1]. -core.commitGraph:: - Enable git commit graph feature. Allows reading from the - commit-graph file. +gc.commitGraph:: + If true, then gc will rewrite the commit-graph file when + linkgit:git-gc[1] is run. When using linkgit:git-gc[1] + '--auto' the commit-graph will be updated if housekeeping is + required. Default is false. See linkgit:git-commit-graph[1] + for details. core.sparseCheckout:: Enable "sparse checkout" feature. See section "Sparse checkout" in diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index 24b2dd44fe..f5bc98ccb3 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` determines if it within all non-bare repos or it can be set to a boolean value. This defaults to true. +The optional configuration variable `gc.commitGraph` determines if +'git gc' should run 'git commit-graph write'. This can be set to a +boolean value. This defaults to false. + The optional configuration variable `gc.aggressiveWindow` controls how much time is spent optimizing the delta compression of the objects in the repository when the --aggressive option is specified. The larger diff --git a/builtin/gc.c b/builtin/gc.c index ccfb1ceaeb..b7dd206f41 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -20,6 +20,7 @@ #include "sigchain.h" #include "argv-array.h" #include "commit.h" +#include "commit-graph.h" #include "packfile.h" #include "object-store.h" #include "pack.h" @@ -40,6 +41,7 @@ static int aggressive_depth = 50; static int aggressive_window = 250; static int gc_auto_threshold = 6700; static int gc_auto_pack_limit = 50; +static int gc_write_commit_graph = 0; static int detach_auto = 1; static timestamp_t gc_log_expire_time; static const char *gc_log_expire = "1.day.ago"; @@ -129,6 +131,7 @@ static void gc_config(void) git_config_get_int("gc.aggressivedepth", &aggressive_depth); git_config_get_int("gc.auto", &gc_auto_threshold); git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit); + git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph); git_config_get_bool("gc.autodetach", &detach_auto); git_config_get_expiry("gc.pruneexpire", &prune_expire); git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire); @@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (pack_garbage.nr > 0) clean_pack_garbage(); + if (gc_write_commit_graph) + write_commit_graph_reachable(get_object_directory(), 0); + if (auto_gc && too_many_loose_objects()) warning(_("There are too many unreachable loose objects; " "run 'git prune' to remove them.")); diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 2208c266b5..5947de3d24 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full repo' ' test_cmp expect output ' +test_expect_success 'check that gc computes commit-graph' ' + cd "$TRASH_DIRECTORY/full" && + git commit --allow-empty -m "blank" && + git commit-graph write --reachable && + cp $objdir/info/commit-graph commit-graph-before-gc && + git reset --hard HEAD~1 && + git config gc.writeCommitGraph true && + git gc && + cp $objdir/info/commit-graph commit-graph-after-gc && + ! test_cmp commit-graph-before-gc commit-graph-after-gc && + git commit-graph write --reachable && + test_cmp commit-graph-after-gc $objdir/info/commit-graph +' + # the verify tests below expect the commit-graph to contain # exactly the commits reachable