Re: [PATCH 6/8] commit: provide a fast multi-tip contains function
Junio C Hamano gits...@pobox.com writes: If you are trying to do branch --with $commit, what you would want is not exactly paint-down-to-common(all branches, $commit), but something that paints down to $commit from all branches, with an optimization that cuts off the traversal at a commit that is reachable from $commit. If a traversal from branch B reached such a commit that is reachable from $commit, you can stop the traversal because propagating the bit for B further down to its parents will not reach the $commit you are interested in. I forgot about the other direction, i.e. branch --merged $commit. Since I do git branch --no-merged pu fairly often, I care about its performance ;-) We paint $commit as Left, and tips of all the branches as Right. We try to paint down from $commit but the traversal cannot terminate if it reaches a commit that is reachable from one Right ref---we may find that we can reach more Right refs by following its ancestor. We can stop when we paint Right bits fully, of course, but I wonder if we can do better than that. Suppose we just painted a commit with L and Rn bit (i.e. the commit is a common ancestor of the $commit and one branch). We know that traversing down from there will never reach the tip of the branch whose color is Rn (otherwise we will have a cycle from that commit back to the tip of the branch), so if the reason we are continuing the traversal is to prove that the tip of the branch Rn is reachable (or unreachable) from $commit, there is no reason to continue digging from there. Can we exploit that to terminate the traversal earlier? When we encounter a new commit that is painted with L bit and some but not necessarily all R bits, we propagate the bits and check the R bits. Some of the commits in Right set that correspond to R bits may have been painted in L (i.e. we found branches that are merged to $commit), and digging further from this commit will not give us any new information. Other commits are not painted in L (i.e. we do not yet know if $commit merges these branches), so we may need to keep digging. So perhaps we can stop if a commit is painted in L and also has all the R bits that correspond to Right commits that are not yet proven reachable from $commit (i.e. not painted in L)? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] commit: provide a fast multi-tip contains function
Junio C Hamano gits...@pobox.com writes: I forgot about the other direction, i.e. branch --merged $commit. Since I do git branch --no-merged pu fairly often, I care about its performance ;-) We paint $commit as Left, and tips of all the branches as Right. We try to paint down from $commit but the traversal cannot terminate if it reaches a commit that is reachable from one Right ref---we may find that we can reach more Right refs by following its ancestor. We can stop when we paint Right bits fully, of course, but I wonder if we can do better than that. Suppose we just painted a commit with L and Rn bit (i.e. the commit is a common ancestor of the $commit and one branch). We know that traversing down from there will never reach the tip of the branch whose color is Rn (otherwise we will have a cycle from that commit back to the tip of the branch), so if the reason we are continuing the traversal is to prove that the tip of the branch Rn is reachable (or unreachable) from $commit, there is no reason to continue digging from there. Can we exploit that to terminate the traversal earlier? I forgot to mention this case because with branch --no-merged pu it never happens, but another clue we can terminate traversal earier with is when $commit is found to be an ancestor of some Right commits. Then we can start ignoring Rn bits for these Right commits that can reach the Left one, as we know they can never be reached from the Left. That is, the last sentence in the paragraph ... When we encounter a new commit that is painted with L bit and some but not necessarily all R bits, we propagate the bits and check the R bits. Some of the commits in Right set that correspond to R bits may have been painted in L (i.e. we found branches that are merged to $commit), and digging further from this commit will not give us any new information. Other commits are not painted in L (i.e. we do not yet know if $commit merges these branches), so we may need to keep digging. So perhaps we can stop if a commit is painted in L and also has all the R bits that correspond to Right commits that are not yet proven reachable from $commit (i.e. not painted in L)? ... will be more like ignore Rn bits for Right commits that are painted in L (i.e. they are reachable from L) or the Left commit is painted with (i.e. they reach L). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] commit: provide a fast multi-tip contains function
Jeff King p...@peff.net writes: I haven't quite convinced myself that the stale logic in the middle is right. The origin paint_down function checks PARENT1 | PARENT2 to see if we found a merge base (even though PARENT2 may represent many tips). Here I check whether we have _any_ left parent flag and _any_ right parent flag. I'm not sure if I actually need to be finding the merge base of _all_ of the commits. In the one-each-on-two-side case (i.e. the original algorithm), it is any commit we will encounter by digging further down from a STALE one will all be reachable from a newer merge base (e.g. the commit that caused it to be marked as STALE originally). It will never be a useful merge base, so let's mark it as STALE. Even if a future traversal that comes from sideways (i.e. not passing the commit that caused it to be marked as STALE) reach this STALE one, digging further from there won't give us anything new. If you see a commit can be reached from L1 and R2, the only thing you know is that its parents can also be reached from L1 and R2, but it does not tell you if it is reachable from other tips, e.g. L2 or R1. When a traversal reaches such a node from sideways, trying to paint it with L2 for example, you do need to continue digging. I think the traversal optimization based on the STALE bit is merely a halfway optimization to cull obvious duplicates. See how get_merge_bases_many() needs to clean up redundant ones in the result of merge_bases_many(), the latter of which does use the STALE bit to remove obvious duplicates, in order to make sure it won't include a commit in the result that can be reached from another commit in the result, without having to resort to the SLOP trick. Because your end-goal is to tell if Ln is reachable from Rm (for number of n's and m's), coming up with the independent/non-redundant set of merge-bases is not necessary, I think. I suspect that you do not need the STALE half-optimization in the first place. The only time you can say Ah, we've seen this one and no need to dig further is when you are propagating a colour C and the parent in question is already painted in C (it is OK to be painted as reachable from more tips), I would think, so shouldn't the loop be more like, after painting each tip and placing it in the queue: * Dequeue one, check the L/R bits in it and call that C * Iterate over its parents P: * check the L/R bits in P and call that Cp. * If Cp is already a superset of C, there is no point putting P to the queue to be processed. * If Cp is not a superset of C, then update L/R bits in P to mark it reachable from tips represented by C and put P to the queue. until you ran out of queue? +void commit_contains(const struct commit_list *left, + const struct commit_list *right, + unsigned char *left_contains, + unsigned char *right_contains) +{ + struct prio_queue queue = { compare_commits_by_commit_date }; + struct bit_slab left_bits, right_bits, stale_bits; + int left_nr, right_nr; + + left_nr = init_contains_bits(left, left_bits, queue); + right_nr = init_contains_bits(right, right_bits, queue); + init_bit_slab(stale_bits); + + while (queue_has_nonstale_bits(queue, stale_bits)) { + struct commit *commit = prio_queue_get(queue); + struct commit_list *parents; + unsigned char *c_left, *c_right, *c_stale; + + c_left = bit_slab_at(left_bits, commit); + c_right = bit_slab_at(right_bits, commit); + c_stale = bit_slab_at(stale_bits, commit); + + if (!bitset_empty(c_left, left_nr) + !bitset_empty(c_right, right_nr)) + *c_stale = 1; Hmph, is this one-char-a bit? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] commit: provide a fast multi-tip contains function
Junio C Hamano gits...@pobox.com writes: The only time you can say Ah, we've seen this one and no need to dig further is when you are propagating a colour C and the parent in question is already painted in C (it is OK to be painted as reachable from more tips), I would think, so shouldn't the loop be more like, after painting each tip and placing it in the queue: * Dequeue one, check the L/R bits in it and call that C * Iterate over its parents P: * check the L/R bits in P and call that Cp. * If Cp is already a superset of C, there is no point putting P to the queue to be processed. * If Cp is not a superset of C, then update L/R bits in P to mark it reachable from tips represented by C and put P to the queue. until you ran out of queue? Actually that will cause you to dig down to the root, so it won't be nice. If you are trying to do branch --with $commit, what you would want is not exactly paint-down-to-common(all branches, $commit), but something that paints down to $commit from all branches, with an optimization that cuts off the traversal at a commit that is reachable from $commit. If a traversal from branch B reached such a commit that is reachable from $commit, you can stop the traversal because propagating the bit for B further down to its parents will not reach the $commit you are interested in. So the termination condition for this a single Left (I'll use Left for $commit and Right for all branches) case is if a commit is already painted as reachable from $commit, do not propagate bits further down. What does it mean to look for branch --with $commit1 $commit2 (i.e. more than one in the Left set)? If we are trying to see which branches reach _both_ of these commits, then replace the ablve with if a commit is already painted as reachable from both $commit{1,2}, then painting it with any branch bits is futile---its parents will never reach either $commit1 nor $commit2, so the additional termination condition will be If left bits are full, then stop. I do not know how you can optimize the traversal if you are trying to see which branches reach at least one of these commits, though. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] commit: provide a fast multi-tip contains function
Junio C Hamano gits...@pobox.com writes: What does it mean to look for branch --with $commit1 $commit2 (i.e. more than one in the Left set)? If we are trying to see which branches reach _both_ of these commits, then replace the ablve with if a commit is already painted as reachable from both $commit{1,2}, then painting it with any branch bits is futile---its parents will never reach either $commit1 nor $commit2, so the additional termination condition will be If left bits are full, then stop. I do not know how you can optimize the traversal if you are trying to see which branches reach at least one of these commits, though. By the way, don't we do 80% of this already in show-branch? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] commit: provide a fast multi-tip contains function
When commands like git branch --contains want to know which branches contain a particular commit, they run a series of merge-base calculations, one per branch. This can be very slow if you have a large number of branches. We made tag --contains faster in ffc4b80 (tag: speed up --contains calculation, 2011-06-11) by switching to a different algorithm that caches intermediate contains information from each tag we check. The downside of the new algorithm is that it moves depth-first through the graph. So it tends to go all the way to the roots, even if the contained commit is near the top of history. That works OK for tags, because repositories tend to have tags near the roots anyway (e.g., a v0.1 or similar). The number of commits we look at increased a little bit, but since we avoid traversing over the same parts of history repeatedly, it was a huge net win. For branch --contains, it is less clear that this is a win. Most branches stay up to date, so we can bound a search for a recent commit when we hit the merge base between the commit and the branches. The ideal would be to use the merge-base-style breadth-first traversal, but to perform a single traversal for all tips. The problem is that we need one bit of storage per tip in each commit, and struct commit has only a fixed number of bits. We can solve that by using a process similar to paint_down_to_common, but instead of storing PARENT1 and PARENT2 flags, using a commit slab to store one bit per tip. Signed-off-by: Jeff King p...@peff.net --- This is the interesting commit, and I'd really love some eyes on the logic. It's basically paint_down_to_common but with the PARENT1 and PARENT2 flags replaced with larger bitfields. I haven't quite convinced myself that the stale logic in the middle is right. The origin paint_down function checks PARENT1 | PARENT2 to see if we found a merge base (even though PARENT2 may represent many tips). Here I check whether we have _any_ left parent flag and _any_ right parent flag. I'm not sure if I actually need to be finding the merge base of _all_ of the commits. I don't think so, and I can't find a case where this doesn't work, but perhaps I am not being imaginative enough. commit.c | 102 +++ commit.h | 17 +++ 2 files changed, 119 insertions(+) diff --git a/commit.c b/commit.c index 445b679..66e0def 100644 --- a/commit.c +++ b/commit.c @@ -11,6 +11,7 @@ #include commit-slab.h #include prio-queue.h #include sha1-lookup.h +#include bitset.h static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **); @@ -1040,6 +1041,107 @@ struct commit_list *reduce_heads(struct commit_list *heads) return result; } +define_commit_slab(bit_slab, unsigned char); + +static int init_contains_bits(const struct commit_list *commits, + struct bit_slab *bits, + struct prio_queue *queue) +{ + int i, nr = commit_list_count(commits); + + init_bit_slab_with_stride(bits, bitset_sizeof(nr)); + for (i = 0; i nr; i++, commits = commits-next) { + struct commit *c = commits-item; + + prio_queue_put(queue, c); + bitset_set(bit_slab_at(bits, c), i); + } + + return nr; +} + +static int queue_has_nonstale_bits(struct prio_queue *queue, struct bit_slab *stale) +{ + int i; + for (i = 0; i queue-nr; i++) { + struct commit *commit = queue-array[i]; + if (!*bit_slab_at(stale, commit)) + return 1; + } + return 0; +} + +static void fill_contains_result(unsigned char *result, int nr, +struct bit_slab *bits, +const struct commit_list *other_side) +{ + const struct commit_list *c; + + memset(result, 0, nr); + for (c = other_side; c; c = c-next) { + unsigned char *c_bits = bit_slab_at(bits, c-item); + int i; + + for (i = 0; i nr; i++) + result[i] |= bitset_get(c_bits, i); + } +} + +void commit_contains(const struct commit_list *left, +const struct commit_list *right, +unsigned char *left_contains, +unsigned char *right_contains) +{ + struct prio_queue queue = { compare_commits_by_commit_date }; + struct bit_slab left_bits, right_bits, stale_bits; + int left_nr, right_nr; + + left_nr = init_contains_bits(left, left_bits, queue); + right_nr = init_contains_bits(right, right_bits, queue); + init_bit_slab(stale_bits); + + while (queue_has_nonstale_bits(queue, stale_bits)) { + struct commit *commit = prio_queue_get(queue); + struct commit_list *parents; + unsigned char *c_left, *c_right, *c_stale; + + c_left =