Re: [PATCH 6/8] commit: provide a fast multi-tip contains function

2014-07-01 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 If you are trying to do branch --with $commit, what you would want
 is not exactly paint-down-to-common(all branches, $commit), but
 something that paints down to $commit from all branches, with an
 optimization that cuts off the traversal at a commit that is
 reachable from $commit.  If a traversal from branch B reached such a
 commit that is reachable from $commit, you can stop the traversal
 because propagating the bit for B further down to its parents will
 not reach the $commit you are interested in.

I forgot about the other direction, i.e. branch --merged $commit.
Since I do git branch --no-merged pu fairly often, I care about
its performance ;-)

We paint $commit as Left, and tips of all the branches as Right.  We
try to paint down from $commit but the traversal cannot terminate if
it reaches a commit that is reachable from one Right ref---we may
find that we can reach more Right refs by following its ancestor.
We can stop when we paint Right bits fully, of course, but I wonder
if we can do better than that.

Suppose we just painted a commit with L and Rn bit (i.e. the commit
is a common ancestor of the $commit and one branch).  We know that
traversing down from there will never reach the tip of the branch
whose color is Rn (otherwise we will have a cycle from that commit
back to the tip of the branch), so if the reason we are continuing
the traversal is to prove that the tip of the branch Rn is reachable
(or unreachable) from $commit, there is no reason to continue
digging from there.  Can we exploit that to terminate the traversal
earlier?

When we encounter a new commit that is painted with L bit and some
but not necessarily all R bits, we propagate the bits and check the
R bits.  Some of the commits in Right set that correspond to R bits
may have been painted in L (i.e. we found branches that are merged
to $commit), and digging further from this commit will not give us
any new information.  Other commits are not painted in L (i.e. we do
not yet know if $commit merges these branches), so we may need to
keep digging.  So perhaps we can stop if a commit is painted in L
and also has all the R bits that correspond to Right commits that
are not yet proven reachable from $commit (i.e. not painted in L)?






--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] commit: provide a fast multi-tip contains function

2014-07-01 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 I forgot about the other direction, i.e. branch --merged $commit.
 Since I do git branch --no-merged pu fairly often, I care about
 its performance ;-)

 We paint $commit as Left, and tips of all the branches as Right.  We
 try to paint down from $commit but the traversal cannot terminate if
 it reaches a commit that is reachable from one Right ref---we may
 find that we can reach more Right refs by following its ancestor.
 We can stop when we paint Right bits fully, of course, but I wonder
 if we can do better than that.

 Suppose we just painted a commit with L and Rn bit (i.e. the commit
 is a common ancestor of the $commit and one branch).  We know that
 traversing down from there will never reach the tip of the branch
 whose color is Rn (otherwise we will have a cycle from that commit
 back to the tip of the branch), so if the reason we are continuing
 the traversal is to prove that the tip of the branch Rn is reachable
 (or unreachable) from $commit, there is no reason to continue
 digging from there.  Can we exploit that to terminate the traversal
 earlier?

I forgot to mention this case because with branch --no-merged pu
it never happens, but another clue we can terminate traversal earier
with is when $commit is found to be an ancestor of some Right
commits.  Then we can start ignoring Rn bits for these Right commits
that can reach the Left one, as we know they can never be reached
from the Left.  That is, the last sentence in the paragraph ...

 When we encounter a new commit that is painted with L bit and some
 but not necessarily all R bits, we propagate the bits and check the
 R bits.  Some of the commits in Right set that correspond to R bits
 may have been painted in L (i.e. we found branches that are merged
 to $commit), and digging further from this commit will not give us
 any new information.  Other commits are not painted in L (i.e. we do
 not yet know if $commit merges these branches), so we may need to
 keep digging.  So perhaps we can stop if a commit is painted in L
 and also has all the R bits that correspond to Right commits that
 are not yet proven reachable from $commit (i.e. not painted in L)?

... will be more like ignore Rn bits for Right commits that are
painted in L (i.e. they are reachable from L) or the Left commit is
painted with (i.e. they reach L).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] commit: provide a fast multi-tip contains function

2014-06-26 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 I haven't quite convinced myself that the stale logic in the middle is
 right. The origin paint_down function checks PARENT1 | PARENT2 to see
 if we found a merge base (even though PARENT2 may represent many tips).
 Here I check whether we have _any_ left parent flag and _any_ right
 parent flag. I'm not sure if I actually need to be finding the merge
 base of _all_ of the commits.

In the one-each-on-two-side case (i.e. the original algorithm), it
is any commit we will encounter by digging further down from a
STALE one will all be reachable from a newer merge base (e.g. the
commit that caused it to be marked as STALE originally).  It will
never be a useful merge base, so let's mark it as STALE.  Even if a
future traversal that comes from sideways (i.e. not passing the
commit that caused it to be marked as STALE) reach this STALE one,
digging further from there won't give us anything new.

If you see a commit can be reached from L1 and R2, the only thing
you know is that its parents can also be reached from L1 and R2, but
it does not tell you if it is reachable from other tips, e.g. L2 or
R1.  When a traversal reaches such a node from sideways, trying to
paint it with L2 for example, you do need to continue digging.

I think the traversal optimization based on the STALE bit is merely
a halfway optimization to cull obvious duplicates.  See how
get_merge_bases_many() needs to clean up redundant ones in the
result of merge_bases_many(), the latter of which does use the STALE
bit to remove obvious duplicates, in order to make sure it won't
include a commit in the result that can be reached from another
commit in the result, without having to resort to the SLOP trick.

Because your end-goal is to tell if Ln is reachable from Rm (for
number of n's and m's), coming up with the independent/non-redundant
set of merge-bases is not necessary, I think.  I suspect that you do
not need the STALE half-optimization in the first place.

The only time you can say Ah, we've seen this one and no need to
dig further is when you are propagating a colour C and the parent
in question is already painted in C (it is OK to be painted as
reachable from more tips), I would think, so shouldn't the loop be
more like, after painting each tip and placing it in the queue:

 * Dequeue one, check the L/R bits in it and call that C

 * Iterate over its parents P:

   * check the L/R bits in P and call that Cp.

   * If Cp is already a superset of C, there is no point putting P
 to the queue to be processed.

   * If Cp is not a superset of C, then update L/R bits in P to mark
 it reachable from tips represented by C and put P to the queue.

until you ran out of queue?




 +void commit_contains(const struct commit_list *left,
 +  const struct commit_list *right,
 +  unsigned char *left_contains,
 +  unsigned char *right_contains)
 +{
 + struct prio_queue queue = { compare_commits_by_commit_date };
 + struct bit_slab left_bits, right_bits, stale_bits;
 + int left_nr, right_nr;
 +
 + left_nr = init_contains_bits(left, left_bits, queue);
 + right_nr = init_contains_bits(right, right_bits, queue);
 + init_bit_slab(stale_bits);
 +
 + while (queue_has_nonstale_bits(queue, stale_bits)) {
 + struct commit *commit = prio_queue_get(queue);
 + struct commit_list *parents;
 + unsigned char *c_left, *c_right, *c_stale;
 +
 + c_left = bit_slab_at(left_bits, commit);
 + c_right = bit_slab_at(right_bits, commit);
 + c_stale = bit_slab_at(stale_bits, commit);
 +
 + if (!bitset_empty(c_left, left_nr) 
 + !bitset_empty(c_right, right_nr))
 + *c_stale = 1;

Hmph, is this one-char-a bit?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] commit: provide a fast multi-tip contains function

2014-06-26 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 The only time you can say Ah, we've seen this one and no need to
 dig further is when you are propagating a colour C and the parent
 in question is already painted in C (it is OK to be painted as
 reachable from more tips), I would think, so shouldn't the loop be
 more like, after painting each tip and placing it in the queue:

  * Dequeue one, check the L/R bits in it and call that C

  * Iterate over its parents P:

* check the L/R bits in P and call that Cp.

* If Cp is already a superset of C, there is no point putting P
  to the queue to be processed.

* If Cp is not a superset of C, then update L/R bits in P to mark
  it reachable from tips represented by C and put P to the queue.

 until you ran out of queue?

Actually that will cause you to dig down to the root, so it won't be
nice.

If you are trying to do branch --with $commit, what you would want
is not exactly paint-down-to-common(all branches, $commit), but
something that paints down to $commit from all branches, with an
optimization that cuts off the traversal at a commit that is
reachable from $commit.  If a traversal from branch B reached such a
commit that is reachable from $commit, you can stop the traversal
because propagating the bit for B further down to its parents will
not reach the $commit you are interested in.

So the termination condition for this a single Left (I'll use Left
for $commit and Right for all branches) case is if a commit is
already painted as reachable from $commit, do not propagate bits
further down.

What does it mean to look for branch --with $commit1 $commit2
(i.e. more than one in the Left set)?  If we are trying to see which
branches reach _both_ of these commits, then replace the ablve with
if a commit is already painted as reachable from both $commit{1,2},
then painting it with any branch bits is futile---its parents will
never reach either $commit1 nor $commit2, so the additional
termination condition will be If left bits are full, then stop.

I do not know how you can optimize the traversal if you are trying
to see which branches reach at least one of these commits, though.





--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] commit: provide a fast multi-tip contains function

2014-06-26 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 What does it mean to look for branch --with $commit1 $commit2
 (i.e. more than one in the Left set)?  If we are trying to see which
 branches reach _both_ of these commits, then replace the ablve with
 if a commit is already painted as reachable from both $commit{1,2},
 then painting it with any branch bits is futile---its parents will
 never reach either $commit1 nor $commit2, so the additional
 termination condition will be If left bits are full, then stop.

 I do not know how you can optimize the traversal if you are trying
 to see which branches reach at least one of these commits, though.

By the way, don't we do 80% of this already in show-branch?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] commit: provide a fast multi-tip contains function

2014-06-25 Thread Jeff King
When commands like git branch --contains want to know
which branches contain a particular commit, they run a
series of merge-base calculations, one per branch. This can
be very slow if you have a large number of branches.

We made tag --contains faster in ffc4b80 (tag: speed up
--contains calculation, 2011-06-11) by switching to a
different algorithm that caches intermediate contains
information from each tag we check. The downside of the new
algorithm is that it moves depth-first through the graph. So
it tends to go all the way to the roots, even if the
contained commit is near the top of history. That works OK
for tags, because repositories tend to have tags near the
roots anyway (e.g., a v0.1 or similar). The number of
commits we look at increased a little bit, but since we
avoid traversing over the same parts of history repeatedly,
it was a huge net win.

For branch --contains, it is less clear that this is a
win. Most branches stay up to date, so we can bound a search
for a recent commit when we hit the merge base between the
commit and the branches.

The ideal would be to use the merge-base-style breadth-first
traversal, but to perform a single traversal for all tips.
The problem is that we need one bit of storage per tip in
each commit, and struct commit has only a fixed number of
bits. We can solve that by using a process similar to
paint_down_to_common, but instead of storing PARENT1 and
PARENT2 flags, using a commit slab to store one bit per tip.

Signed-off-by: Jeff King p...@peff.net
---
This is the interesting commit, and I'd really love some eyes on the
logic. It's basically paint_down_to_common but with the PARENT1 and
PARENT2 flags replaced with larger bitfields.

I haven't quite convinced myself that the stale logic in the middle is
right. The origin paint_down function checks PARENT1 | PARENT2 to see
if we found a merge base (even though PARENT2 may represent many tips).
Here I check whether we have _any_ left parent flag and _any_ right
parent flag. I'm not sure if I actually need to be finding the merge
base of _all_ of the commits. I don't think so, and I can't find a case
where this doesn't work, but perhaps I am not being imaginative enough.

 commit.c | 102 +++
 commit.h |  17 +++
 2 files changed, 119 insertions(+)

diff --git a/commit.c b/commit.c
index 445b679..66e0def 100644
--- a/commit.c
+++ b/commit.c
@@ -11,6 +11,7 @@
 #include commit-slab.h
 #include prio-queue.h
 #include sha1-lookup.h
+#include bitset.h
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char 
*buf, size_t len, const char **);
 
@@ -1040,6 +1041,107 @@ struct commit_list *reduce_heads(struct commit_list 
*heads)
return result;
 }
 
+define_commit_slab(bit_slab, unsigned char);
+
+static int init_contains_bits(const struct commit_list *commits,
+ struct bit_slab *bits,
+ struct prio_queue *queue)
+{
+   int i, nr = commit_list_count(commits);
+
+   init_bit_slab_with_stride(bits, bitset_sizeof(nr));
+   for (i = 0; i  nr; i++, commits = commits-next) {
+   struct commit *c = commits-item;
+
+   prio_queue_put(queue, c);
+   bitset_set(bit_slab_at(bits, c), i);
+   }
+
+   return nr;
+}
+
+static int queue_has_nonstale_bits(struct prio_queue *queue, struct bit_slab 
*stale)
+{
+   int i;
+   for (i = 0; i  queue-nr; i++) {
+   struct commit *commit = queue-array[i];
+   if (!*bit_slab_at(stale, commit))
+   return 1;
+   }
+   return 0;
+}
+
+static void fill_contains_result(unsigned char *result, int nr,
+struct bit_slab *bits,
+const struct commit_list *other_side)
+{
+   const struct commit_list *c;
+
+   memset(result, 0, nr);
+   for (c = other_side; c; c = c-next) {
+   unsigned char *c_bits = bit_slab_at(bits, c-item);
+   int i;
+
+   for (i = 0; i  nr; i++)
+   result[i] |= bitset_get(c_bits, i);
+   }
+}
+
+void commit_contains(const struct commit_list *left,
+const struct commit_list *right,
+unsigned char *left_contains,
+unsigned char *right_contains)
+{
+   struct prio_queue queue = { compare_commits_by_commit_date };
+   struct bit_slab left_bits, right_bits, stale_bits;
+   int left_nr, right_nr;
+
+   left_nr = init_contains_bits(left, left_bits, queue);
+   right_nr = init_contains_bits(right, right_bits, queue);
+   init_bit_slab(stale_bits);
+
+   while (queue_has_nonstale_bits(queue, stale_bits)) {
+   struct commit *commit = prio_queue_get(queue);
+   struct commit_list *parents;
+   unsigned char *c_left, *c_right, *c_stale;
+
+   c_left =