[PATCH 1/1] Docs: Add commit-graph tech docs to Makefile

2018-08-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Ensure that the commit-graph.txt and commit-graph-format.txt files are compiled to HTML using ASCIIDOC. Signed-off-by: Derrick Stolee --- Documentation/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/Makefile b/Documentation/Makefile index

[PATCH 0/1] Docs: Add commit-graph tech docs to Makefile

2018-08-21 Thread Derrick Stolee via GitGitGadget
Similar to [1], add the commit-graph and commit-graph-format technical docs to Documentation/Makefile so they are automatically converted to HTML when needed. I compiled the docs and inspected the HTML manually in the browser. Nothing looked strange, so I don't think the docs themselves need any

[PATCH v2 1/2] Docs: Add commit-graph tech docs to Makefile

2018-08-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Ensure that the commit-graph.txt and commit-graph-format.txt files are compiled to HTML using ASCIIDOC. Signed-off-by: Derrick Stolee --- Documentation/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/Makefile b/Documentation/Makefile index

[PATCH v2 2/2] commit-graph.txt: improve formatting for asciidoc

2018-08-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When viewing commit-graph.txt as a plain-text document, it makes sense to keep paragraphs left-padded between bullet points. However, asciidoc converts these left-padded paragraphs as monospace fonts, creating an unpleasant document. Remove the padding. The "Future Work"

[PATCH v2 0/2] Docs: Add commit-graph tech docs to Makefile

2018-08-21 Thread Derrick Stolee via GitGitGadget
Similar to [1], add the commit-graph and commit-graph-format technical docs to Documentation/Makefile so they are automatically converted to HTML when needed. I compiled the docs and inspected the HTML manually in the browser. As suggested, I modified the documents to format a bit better. See the

[PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-08-28 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The commit-graph feature is tested in isolation by t5318-commit-graph.sh and t6600-test-reach.sh, but there are many more interesting scenarios involving commit walks. Many of these scenarios are covered by the existing test suite, but we need to maintain coverage when the

[PATCH 4/6] revision.c: begin refactoring --topo-order logic

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When running 'git rev-list --topo-order' and its kin, the topo_order setting in struct rev_info implies the limited setting. This means that the following things happen during prepare_revision_walk(): * revs->limited implies we run limit_list() to walk the entire

[PATCH 3/6] test-reach: add rev-list tests

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The rev-list command is critical to Git's functionality. Ensure it works in the three commit-graph environments constructed in t6600-test-reach.sh. Here are a few important types of rev-list operations: * Basic: git rev-list --topo-order HEAD * Range: git rev-list

[PATCH 5/6] commit/revisions: bookkeeping before refactoring

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee There are a few things that need to move around a little before making a big refactoring in the topo-order logic: 1. We need access to record_author_date() and compare_commits_by_author_date() in revision.c. These are used currently by sort_in_topological_order() in

[PATCH 0/6] Use generation numbers for --topo-order

2018-08-27 Thread Derrick Stolee via GitGitGadget
This patch series performs a decently-sized refactoring of the revision-walk machinery. Well, "refactoring" is probably the wrong word, as I don't actually remove the old code. Instead, when we see certain options in the 'rev_info' struct, we redirect the commit-walk logic to a new set of methods

[PATCH 6/6] revision.c: refactor basic topo-order logic

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When running a command like 'git rev-list --topo-order HEAD', Git performed the following steps: 1. Run limit_list(), which parses all reachable commits, adds them to a linked list, and distributes UNINTERESTING flags. If all unprocessed commits are UNINTERESTING,

[PATCH 1/6] prio-queue: add 'peek' operation

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When consuming a priority queue, it can be convenient to inspect the next object that will be dequeued without actually dequeueing it. Our existing library did not have such a 'peek' operation, so add it as prio_queue_peek(). Add a reference-level comparison in

[PATCH 2/6] test-reach: add run_three_modes method

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The 'test_three_modes' method assumes we are using the 'test-tool reach' command for our test. However, we may want to use the data shape of our commit graph and the three modes (no commit-graph, full commit-graph, partial commit-graph) for other git commands. Split

[PATCH v2 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-08-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The commit-graph feature is tested in isolation by t5318-commit-graph.sh and t6600-test-reach.sh, but there are many more interesting scenarios involving commit walks. Many of these scenarios are covered by the existing test suite, but we need to maintain coverage when the

[PATCH v2 0/1] Define GIT_TEST_COMMIT_GRAPH for commit-graph test coverage

2018-08-29 Thread Derrick Stolee via GitGitGadget
The commit-graph (and multi-pack-index) features are optional data structures that can make Git operations faster. Since they are optional, we do not enable them in most Git tests. The commit-graph is tested in t5318-commit-graph.sh (and t6600-test-reach.sh in ds/reachable), but that one script

[PATCH 4/8] test-repository: properly init repo

2018-07-18 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- t/helper/test-repository.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c index 2762ca656..6a84a53ef 100644 --- a/t/helper/test-repository.c +++

[PATCH 6/8] commit-graph: not compatible with grafts

2018-07-18 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Augment commit_graph_compatible(r) to return false when the given repository r has commit grafts or is a shallow clone. Test that in these situations we ignore existing commit-graph files and we do not write new commit-graph files. Signed-off-by: Derrick Stolee ---

[PATCH 5/8] commit-graph: not compatible with replace objects

2018-07-18 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Create new method commit_graph_compatible(r) to check if a given repository r is compatible with the commit-graph feature. Fill the method with a check to see if replace-objects exist. Test this interaction succeeds, including ignoring an existing commit-graph and failing to

[PATCH 7/8] commit-graph: not compatible with uninitialized repo

2018-07-18 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- commit-graph.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 5097c7c12..233958e10 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -60,6 +60,9 @@ static struct commit_graph

[PATCH 8/8] commit-graph: close_commit_graph before shallow walk

2018-07-18 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Make close_commit_graph() work for arbitrary repositories. Call close_commit_graph() when about to start a rev-list walk that includes shallow commits. This is necessary in code paths that "fake" shallow commits for the sake of fetch. Specifically, test 351 in

[PATCH 0/8] Clarify commit-graph and grafts/replace/shallow incompatibilities

2018-07-18 Thread Derrick Stolee via GitGitGadget
One unresolved issue with the commit-graph feature is that it can cause issues when combined with replace objects, commit grafts, or shallow clones. These are not 100% incompatible, as one could be reasonably successful writing a commit-graph after replacing some objects and not have issues.

[PATCH 3/8] commit-graph: update design document

2018-07-18 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee As it exists right now, the commit-graph feature may provide inconsistent results when combined with commit grafts, replace objects, and shallow clones. Update the design document to discuss why these interactions are difficult to reconcile and how we will avoid errors by

[PATCH 01/16] commit-reach: move walk methods from commit.c

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- Makefile | 1 + commit-reach.c | 359 + commit-reach.h | 41 ++ commit.c | 358 4 files changed, 401 insertions(+), 358

[PATCH 00/16] Consolidate reachability logic

2018-07-16 Thread Derrick Stolee via GitGitGadget
There are many places in Git that use a commit walk to determine reachability between commits and/or refs. A lot of this logic is duplicated. I wanted to achieve the following: 1. Consolidate several different commit walks into one file 2. Reduce duplicate reachability logic 3. Increase

[PATCH 02/16] commit-reach: move ref_newer from remote.c

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- builtin/remote.c | 1 + commit-reach.c | 54 commit-reach.h | 2 ++ http-push.c | 1 + remote.c | 50 +--- remote.h | 1 -

[PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The can_all_from_reach_with_flags() algorithm is currently quadratic in the worst case, because it calls the reachable() method for every 'from' without tracking which commits have already been walked or which can already reach a commit in 'to'. Rewrite the algorithm to

[PATCH 11/16] test-reach: test get_merge_bases_many

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The get_merge_bases_many method returns a list of merge bases for a single commit (A) against a list of commits (X). Some care is needed in constructing the expected behavior because the result is not the expected merge-base for an octopus merge with those parents but

[PATCH 12/16] test-reach: test reduce_heads

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- t/helper/test-reach.c | 7 +++ t/t6600-test-reach.sh | 22 ++ 2 files changed, 29 insertions(+) diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index 97c726040..73cb55208 100644 ---

[PATCH 14/16] commit-reach: replace ref_newer logic

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The ref_newer method is used by 'git push' to check if a force-push is required. This method does not use any kind of cutoff when walking, so in the case of a force-push will walk all reachable commits. The is_descendant_of method already uses paint_down_to_common along

[PATCH 08/16] test-reach: create new test tool for ref_newer

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee As we prepare to change the behavior of the algorithms in commit-reach.c, create a new test-tool subcommand 'reach' to test these methods on interesting commit-graph shapes. To use the new test-tool, use 'test-tool reach ' and provide input to stdin that describes the

[PATCH 13/16] test-reach: test can_all_from_reach_with_flags

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The can_all_from_reach_with_flags method is used by ok_to_give_up in upload-pack.c to see if we have done enough negotiation during a fetch. This method is intentionally created to preserve state between calls to assist with stateful negotiation, such as over SSH. To make

[PATCH 10/16] test-reach: test is_descendant_of

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The is_descendant_of method takes a single commit as its first parameter and a list of commits as its second parameter. Extend the input of the 'test-tool reach' command to take multiple lines of the form "X:" to construct a list of commits. Pass these to is_descendant_of

[PATCH 03/16] commit-reach: move commit_contains from ref-filter

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- commit-reach.c | 120 commit-reach.h | 18 +- fast-import.c | 1 + ref-filter.c | 147 +++-- 4 files changed, 146 insertions(+), 140

[PATCH 05/16] upload-pack: refactor ok_to_give_up()

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee In anticipation of consolidating all commit reachability algorithms, refactor ok_to_give_up() in order to allow splitting its logic into an external method. Signed-off-by: Derrick Stolee --- upload-pack.c | 33 ++--- 1 file changed, 22

[PATCH 04/16] upload-pack: make reachable() more generic

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee In anticipation of moving the reachable() method to commit-reach.c, modify the prototype to be more generic to flags known outside of upload-pack.c. Also rename 'want' to 'from' to make the statement more clear outside of the context of haves/wants negotiation.

[PATCH 16/16] commit-reach: use can_all_from_reach

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The is_descendant_of method previously used in_merge_bases() to check if the commit can reach any of the commits in the provided list. This had two performance problems: 1. The performance is quadratic in worst-case. 2. A single in_merge_bases() call requires walking

[PATCH 07/16] commit-reach: move can_all_from_reach_with_flags

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- commit-reach.c | 62 + commit-reach.h | 13 ++ upload-pack.c | 69 +- 3 files changed, 76 insertions(+), 68 deletions(-) diff --git

[PATCH 09/16] test-reach: test in_merge_bases

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- t/helper/test-reach.c | 6 ++ t/t6600-test-reach.sh | 18 ++ 2 files changed, 24 insertions(+) diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index 8cc570f3b..29104d41a 100644 --- a/t/helper/test-reach.c

[PATCH 06/16] upload-pack: generalize commit date cutoff

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The ok_to_give_up() method uses the commit date as a cutoff to avoid walking the entire reachble set of commits. Before moving the reachable() method to commit-reach.c, pull out the dependence on the global constant 'oldest_have' with a 'min_commit_date' parameter.

[PATCH 06/11] multi-pack-index: verify oid fanout order

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- midx.c | 9 + t/t5319-multi-pack-index.sh | 8 2 files changed, 17 insertions(+) diff --git a/midx.c b/midx.c index a02b19efc1..dfd26b4d74 100644 --- a/midx.c +++ b/midx.c @@ -950,5 +950,14 @@ int

[PATCH 01/11] multi-pack-index: add 'verify' verb

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The multi-pack-index builtin writes multi-pack-index files, and uses a 'write' verb to do so. Add a 'verify' verb that checks this file matches the contents of the pack-indexes it replaces. The current implementation is a no-op, but will be extended in small increments in

[PATCH 07/11] multi-pack-index: verify oid lookup order

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- midx.c | 11 +++ t/t5319-multi-pack-index.sh | 8 2 files changed, 19 insertions(+) diff --git a/midx.c b/midx.c index dfd26b4d74..06d5cfc826 100644 --- a/midx.c +++ b/midx.c @@ -959,5 +959,16 @@ int

[PATCH 09/11] multi-pack-index: verify object offsets

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The 'git multi-pack-index verify' command must verify the object offsets stored in the multi-pack-index are correct. There are two ways the offset chunk can be incorrect: the pack-int-id and the object offset. Replace the BUG() statement with a die() statement, now that we

[PATCH 08/11] multi-pack-index: fix 32-bit vs 64-bit size check

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When loading a 64-bit offset, we intend to check that off_t can store the resulting offset. However, the condition accidentally checks the 32-bit offset to see if it is smaller than a 64-bit value. Fix it, and this will be covered by a test in the 'git multi-pack-index

[PATCH 04/11] multi-pack-index: verify packname order

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The final check we make while loading a multi-pack-index is that the packfile names are in lexicographical order. Make this error be a die() instead. In order to test this condition, we need multiple packfiles. Earlier in t5319-multi-pack-index.sh, we tested the interaction

[PATCH 05/11] multi-pack-index: verify missing pack

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- midx.c | 16 t/t5319-multi-pack-index.sh | 5 + 2 files changed, 21 insertions(+) diff --git a/midx.c b/midx.c index e655a15aed..a02b19efc1 100644 --- a/midx.c +++ b/midx.c @@ -926,13 +926,29 @@

[PATCH 03/11] multi-pack-index: verify corrupt chunk lookup table

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee Signed-off-by: Derrick Stolee --- midx.c | 3 +++ t/t5319-multi-pack-index.sh | 13 + 2 files changed, 16 insertions(+) diff --git a/midx.c b/midx.c index ec78254bb6..8b054b39ab 100644 --- a/midx.c +++ b/midx.c @@ -100,6 +100,9 @@ struct

[PATCH 10/11] multi-pack-index: report progress during 'verify'

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When verifying a multi-pack-index, the only action that takes significant time is checking the object offsets. For example, to verify a multi-pack-index containing 6.2 million objects in the Linux kernel repository takes 1.3 seconds on my machine. 99% of that time is spent

[PATCH 11/11] fsck: verify multi-pack-index

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When core.multiPackIndex is true, we may have a multi-pack-index in our object directory. Add calls to 'git multi-pack-index verify' at the end of 'git fsck' if so. Signed-off-by: Derrick Stolee --- builtin/fsck.c | 18 ++

[PATCH 00/11] Add 'git multi-pack-index verify' command

2018-09-05 Thread Derrick Stolee via GitGitGadget
The multi-pack-index file provides faster lookups in repos with many packfiles by duplicating the information from multiple pack-indexes into a single file. This series allows us to verify a multi-pack-index using 'git multi-pack-index verify' and 'git fsck' (when core.multiPackIndex is true).

[PATCH 02/11] multi-pack-index: verify bad header

2018-09-05 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When verifying if a multi-pack-index file is valid, we want the command to fail to signal an invalid file. Previously, we wrote an error to stderr and continued as if we had no multi-pack-index. Now, die() instead of error(). Add tests that check corrupted headers in a few

[PATCH 1/3] midx: fix broken free() in close_midx()

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When closing a multi-pack-index, we intend to close each pack-file and free the struct packed_git that represents it. However, this line was previously freeing the array of pointers, not the pointer itself. This leads to a double-free issue. Signed-off-by: Derrick Stolee

[PATCH 0/3] Add GIT_TEST_MULTI_PACK_INDEX environment variable

2018-10-08 Thread Derrick Stolee via GitGitGadget
To increase coverage of the multi-pack-index feature, add a GIT_TEST_MULTI_PACK_INDEX environment variable similar to other GIT_TEST_* variables. After creating the environment variable and running the test suite with it enabled, I found a few bugs in the multi-pack-index implementation. These

[PATCH 3/3] multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The multi-pack-index feature is tested in isolation by t5319-multi-pack-index.sh, but there are many more interesting scenarios in the test suite surrounding pack-file data shapes and interactions. Since the multi-pack-index is an optional data structure, it does not make

[PATCH v2 0/3] Add GIT_TEST_MULTI_PACK_INDEX environment variable

2018-10-12 Thread Derrick Stolee via GitGitGadget
To increase coverage of the multi-pack-index feature, add a GIT_TEST_MULTI_PACK_INDEX environment variable similar to other GIT_TEST_* variables. After creating the environment variable and running the test suite with it enabled, I found a few bugs in the multi-pack-index implementation. These

[PATCH v2 2/3] midx: close multi-pack-index on repack

2018-10-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When repacking, we may remove pack-files. This invalidates the multi-pack-index (if it exists). Previously, we removed the multi-pack-index file before removing any pack-file. In some cases, the repack command may load the multi-pack-index into memory. This may lead to later

[PATCH v2 3/3] multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX

2018-10-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The multi-pack-index feature is tested in isolation by t5319-multi-pack-index.sh, but there are many more interesting scenarios in the test suite surrounding pack-file data shapes and interactions. Since the multi-pack-index is an optional data structure, it does not make

[PATCH v2 1/3] midx: fix broken free() in close_midx()

2018-10-12 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When closing a multi-pack-index, we intend to close each pack-file and free the struct packed_git that represents it. However, this line was previously freeing the array of pointers, not the pointer itself. This leads to a double-free issue. Signed-off-by: Derrick Stolee

[PATCH 0/1] commit-reach: fix first-parent heuristic

2018-10-18 Thread Derrick Stolee via GitGitGadget
I originally reported this fix [1] after playing around with the trace2 series for measuring performance. Since trace2 isn't merging quickly, I pulled the performance fix patch out and am sending it on its own. The only difference here is that we don't have the tracing to verify the performance

[PATCH 1/1] commit-reach: fix first-parent heuristic

2018-10-18 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The algorithm in can_all_from_reach_with_flags() performs a depth- first-search, terminated by generation number, intending to use a hueristic that "important" commits are found in the first-parent history. This heuristic is valuable in scenarios like fetch negotiation.

[PATCH v4 4/7] revision.c: begin refactoring --topo-order logic

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When running 'git rev-list --topo-order' and its kin, the topo_order setting in struct rev_info implies the limited setting. This means that the following things happen during prepare_revision_walk(): * revs->limited implies we run limit_list() to walk the entire

[PATCH v4 6/7] revision.c: generation-based topo-order algorithm

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The current --topo-order algorithm requires walking all reachable commits up front, topo-sorting them, all before outputting the first value. This patch introduces a new algorithm which uses stored generation numbers to incrementally walk in topo-order, outputting commits as

[PATCH v4 3/7] test-reach: add rev-list tests

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The rev-list command is critical to Git's functionality. Ensure it works in the three commit-graph environments constructed in t6600-test-reach.sh. Here are a few important types of rev-list operations: * Basic: git rev-list --topo-order HEAD * Range: git rev-list

[PATCH v4 1/7] prio-queue: add 'peek' operation

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When consuming a priority queue, it can be convenient to inspect the next object that will be dequeued without actually dequeueing it. Our existing library did not have such a 'peek' operation, so add it as prio_queue_peek(). Add a reference-level comparison in

[PATCH v4 0/7] Use generation numbers for --topo-order

2018-10-16 Thread Derrick Stolee via GitGitGadget
This patch series performs a decently-sized refactoring of the revision-walk machinery. Well, "refactoring" is probably the wrong word, as I don't actually remove the old code. Instead, when we see certain options in the 'rev_info' struct, we redirect the commit-walk logic to a new set of methods

[PATCH v4 7/7] t6012: make rev-list tests more interesting

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee As we are working to rewrite some of the revision-walk machinery, there could easily be some interesting interactions between the options that force topological constraints (--topo-order, --date-order, and --author-date-order) along with specifying a path. Add extra tests

[PATCH v4 5/7] commit/revisions: bookkeeping before refactoring

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee There are a few things that need to move around a little before making a big refactoring in the topo-order logic: 1. We need access to record_author_date() and compare_commits_by_author_date() in revision.c. These are used currently by sort_in_topological_order() in

[PATCH v4 2/7] test-reach: add run_three_modes method

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The 'test_three_modes' method assumes we are using the 'test-tool reach' command for our test. However, we may want to use the data shape of our commit graph and the three modes (no commit-graph, full commit-graph, partial commit-graph) for other git commands. Split

[PATCH 2/3] t: explicitly turn off core.commitGraph as needed

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee There are a few tests that already require GIT_TEST_COMMIT_GRAPH=0 as they rely on an interaction with the commits in the object store that is circumvented by parsing commit information from the commit-graph instead. Before enabling core.commitGraph as true by default,

[PATCH 3/3] commit-graph: Use commit-graph by default

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The config setting "core.commitGraph" enables using the commit-graph file to accelerate commit walks through parsing speed and generation numbers. The setting "gc.writeCommitGraph" enables writing the commit-graph file on every non-trivial 'git gc' operation. Together, they

[PATCH 1/3] t6501: use --quiet when testing gc stderr

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The test script t6501-freshen-objects.sh has some tests that care if 'git gc' has any output to stderr. This is intended to say that no warnings occurred related to broken links. However, when we have operations that output progress (like writing the commit-graph) this

[PATCH 0/3] Use commit-graph by default

2018-10-17 Thread Derrick Stolee via GitGitGadget
The commit-graph feature is starting to stabilize. Based on what is in master right now, we have: Git 2.18: * Ability to write commit-graph (requires user interaction). * Commit parsing is faster when commit-graph exists. * Must have core.commitGraph true to use. Git

[PATCH 0/1] Run GIT_TEST_COMMIT_GRAPH and GIT_TEST_MULTI_PACK_INDEX during CI

2018-10-17 Thread Derrick Stolee via GitGitGadget
Our CI scripts include a step to run the test suite with certain optional variables enabled. Now that all branches should build and have tests succeed with GIT_TEST_COMMIT_GRAPH and GIT_TEST_MULTI_PACK_INDEX enabled, add those variables to that stage. Note: the GIT_TEST_MULTI_PACK_INDEX variable

[PATCH 1/1] ci: add optional test variables

2018-10-17 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The commit-graph and multi-pack-index features introduce optional data structures that are not required for normal Git operations. It is important to run the normal test suite without them enabled, but it is helpful to also run the test suite using them. Our continuous

[PATCH 0/3] Make add_missing_tags() linear

2018-10-30 Thread Derrick Stolee via GitGitGadget
As reported earlier [1], the add_missing_tags() method in remote.c has quadratic performance. Some of that performance is curbed due to the generation-number cutoff in in_merge_bases_many(). However, that fix doesn't help users without a commit-graph, and it can still be painful if that cutoff is

[PATCH 1/3] commit-reach: implement get_reachable_subset

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The existing reachability algorithms in commit-reach.c focus on finding merge-bases or determining if all commits in a set X can reach at least one commit in a set Y. However, for two commits sets X and Y, we may also care about which commits in Y are reachable from at least

[PATCH 2/3] test-reach: test get_reachable_subset

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The get_reachable_subset() method returns the list of commits in the 'to' array that are reachable from at least one commit in the 'from' array. Add tests that check this method works in a few cases: 1. All commits in the 'to' list are reachable. This exercises the

[PATCH 3/3] remote: make add_missing_tags() linear

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The add_missing_tags() method currently has quadratic behavior. This is due to a linear number (based on number of tags T) of calls to in_merge_bases_many, which has linear performance (based on number of commits C in the repository). Replace this O(T * C) algorithm with an

[PATCH v2 3/3] remote: make add_missing_tags() linear

2018-11-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The add_missing_tags() method currently has quadratic behavior. This is due to a linear number (based on number of tags T) of calls to in_merge_bases_many, which has linear performance (based on number of commits C in the repository). Replace this O(T * C) algorithm with an

[PATCH v2 0/3] Make add_missing_tags() linear

2018-11-02 Thread Derrick Stolee via GitGitGadget
As reported earlier [1], the add_missing_tags() method in remote.c has quadratic performance. Some of that performance is curbed due to the generation-number cutoff in in_merge_bases_many(). However, that fix doesn't help users without a commit-graph, and it can still be painful if that cutoff is

[PATCH v2 1/3] commit-reach: implement get_reachable_subset

2018-11-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The existing reachability algorithms in commit-reach.c focus on finding merge-bases or determining if all commits in a set X can reach at least one commit in a set Y. However, for two commits sets X and Y, we may also care about which commits in Y are reachable from at least

[PATCH v2 2/3] test-reach: test get_reachable_subset

2018-11-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The get_reachable_subset() method returns the list of commits in the 'to' array that are reachable from at least one commit in the 'from' array. Add tests that check this method works in a few cases: 1. All commits in the 'to' list are reachable. This exercises the

[PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Derrick Stolee via GitGitGadget
I've been looking into the performance of git push for very large repos. Our users are reporting that 60-80% of git push time is spent during the "Enumerating objects" phase of git pack-objects. A git push process runs several processes during its run, but one includes git send-pack which calls

[PATCH 1/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee During a 'git push' command, we run 'git send-pack' inside of our transport helper. This creates a 'git pack-objects' process and passes a list of object ids. If this list is large, then the pack-objects process can spend a lot of time checking the possible refs these

[PATCH v2 1/1] pack-objects: ignore ambiguous object warnings

2018-11-06 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee A git push process runs several processes during its run, but one includes git send-pack which calls git pack-objects and passes the known have/wants into stdin using object ids. However, the default setting for core.warnAmbiguousRefs requires git pack-objects to check for

[PATCH v2 0/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Derrick Stolee via GitGitGadget
I've been looking into the performance of git push for very large repos. Our users are reporting that 60-80% of git push time is spent during the "Enumerating objects" phase of git pack-objects. A git push process runs several processes during its run, but one includes git send-pack which calls

[PATCH v2 1/3] commit-graph: clean up leaked memory during write

2018-10-03 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find the

[PATCH v2 0/3] Clean up leaks in commit-graph.c

2018-10-03 Thread Derrick Stolee via GitGitGadget
While looking at the commit-graph code, I noticed some memory leaks. These can be found by running valgrind --leak-check=full ./git commit-graph write --reachable The impact of these leaks are small, as we never call write_commit_graph _reachable in a loop, but it is best to be diligent here.

[PATCH v2 3/3] commit-graph: reduce initial oid allocation

2018-10-03 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee While writing a commit-graph file, we store the full list of commits in a flat list. We use this list for sorting and ensuring we are closed under reachability. The initial allocation assumed that (at most) one in four objects is a commit. This is a dramatic over-count for

[PATCH 1/2] commit-graph: clean up leaked memory during write

2018-10-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Running 'valgrind --leak-check=full git commit-graph

[PATCH 0/2] Clean up leaks in commit-graph.c

2018-10-02 Thread Derrick Stolee via GitGitGadget
While looking at the commit-graph code, I noticed some memory leaks. These can be found by running valgrind --leak-check=full ./git commit-graph write --reachable The impact of these leaks are small, as we never call write_commit_graph _reachable in a loop, but it is best to be diligent here.

[PATCH 2/2] commit-graph: reduce initial oid allocation

2018-10-02 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee While writing a commit-graph file, we store the full list of commits in a flat list. We use this list for sorting and ensuring we are closed under reachability. The initial allocation assumed that (at most) one in four objects is a commit. This is a dramatic over-count for

[PATCH v4 0/1] contrib: Add script to show uncovered "new" lines

2018-10-08 Thread Derrick Stolee via GitGitGadget
We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an X.c.gcov file containing the coverage

[PATCH v4 1/1] contrib: add coverage-diff script

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee We have coverage targets in our Makefile for using gcov to display line coverage based on our test suite. The way I like to do it is to run: make coverage-test make coverage-report This leaves the repo in a state where every X.c file that was covered has an

[PATCH 2/3] midx: close multi-pack-index on repack

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee When repacking, we may remove pack-files. This invalidates the multi-pack-index (if it exists). Previously, we removed the multi-pack-index file before removing any pack-file. In some cases, the repack command may load the multi-pack-index into memory. This may lead to later

[PATCH 0/1] v2.19.0-rc1 Performance Regression in 'git merge-base'

2018-08-30 Thread Derrick Stolee via GitGitGadget
As I was testing the release candidate, I stumbled across a regression in 'git merge-base' as a result of the switch to generation numbers. The commit message in [PATCH 1/1] describes the topology involved, but you can test it yourself by comparing 'git merge-base v4.8 v4.9' in the Linux kernel.

[PATCH 1/1] commit: don't use generation numbers if not needed

2018-08-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee In 3afc679b "commit: use generations in paint_down_to_common()", the queue in paint_down_to_common() was changed to use a priority order based on generation number before commit date. This served two purposes: 1. When generation numbers are present, the walk guarantees

[PATCH v3 0/2] Properly peel tags in can_all_from_reach_with_flags()

2018-09-21 Thread Derrick Stolee via GitGitGadget
As Peff reported [1], the refactored can_all_from_reach_with_flags() method does not properly peel tags. Since the helper method can_all_from_reach() and code in t/helper/test-reach.c all peel tags before getting to this method, it is not super-simple to create a test that demonstrates this. I

[PATCH v3 1/2] commit-reach: properly peel tags

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e "commit-reach: make can_all_from_reach... linear" but incorrectly assumed that all objects provided were commits. During a fetch negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags to

[PATCH v3 2/2] commit-reach: fix memory and flag leaks

2018-09-21 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee The can_all_from_reach_with_flag() method uses 'assign_flag' as a value we can use to mark objects temporarily during our commit walk. The intent is that these flags are removed from all objects before returning. However, this is not the case. The 'from' array could also

  1   2   >