Re: [PATCH 3/8] paint_down_to_common: use prio_queue

2014-07-01 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 The downside is that our priority queue is not stable, which
 means that commits with the same timestamp may not come out
 in the order we put them in. You can see this in the test
 update in t6024. That test does a recursive merge across a
 set of commits that all have the same timestamp. For the
 virtual ancestor, the test currently ends up with blob like
 this:

  Temporary merge branch 1
  Temporary merge branch 1
 C
 ===
 B
  Temporary merge branch 2
 ===
 A
  Temporary merge branch 2

 but with this patch, the positions of B and A are swapped.
 This is probably fine, as the order is an internal
 implementation detail anyway (it would _not_ be fine if we
 were using a priority queue for git log traversal, which
 should show commits in parent order).

Interesting that the queue is not stable, but the test can still
rely on a fixed output.  While I tend to agree that for the purpose
of this code path, the order is an internal implementation detail,
but I wonder if it would benefit us a lot if we taught prio-queue to
be optionally more stable, which would allow us to use it in other
code paths that care.  If we really wanted to, I would imagine that
we could keep the insertion counter in the elements of the queue
to make the result stable (i.e. the void **array would become
something like struct { int insertion_ctr; void *thing; } *array).

 I'm slightly hesitant because of the stability thing mentioned above. I
 _think_ it's probably fine. But we could also implement a
 stable_prio_queue on top of the existing prio_queue if we're concerned
 (and that may be something we want to do anyway, because git log would
 want that if it switched to a priority queue).

Heh, I should have read the below-three-dashs commentary before
commenting (I often start working from the commit messages in git
log and then go back to the original thread).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] paint_down_to_common: use prio_queue

2014-07-01 Thread Jeff King
On Tue, Jul 01, 2014 at 09:23:21AM -0700, Junio C Hamano wrote:

  but with this patch, the positions of B and A are swapped.
  This is probably fine, as the order is an internal
  implementation detail anyway (it would _not_ be fine if we
  were using a priority queue for git log traversal, which
  should show commits in parent order).
 
 Interesting that the queue is not stable, but the test can still
 rely on a fixed output.

I think it is deterministic for a particular sequence of inserts/pops,
but not stable with respect to insertion order.

 While I tend to agree that for the purpose
 of this code path, the order is an internal implementation detail,
 but I wonder if it would benefit us a lot if we taught prio-queue to
 be optionally more stable, which would allow us to use it in other
 code paths that care.  If we really wanted to, I would imagine that
 we could keep the insertion counter in the elements of the queue
 to make the result stable (i.e. the void **array would become
 something like struct { int insertion_ctr; void *thing; } *array).

Yeah, I think the reasons to be stable are:

  1. To be on the safe side for operations like this where it
_shouldn't_ matter, but perhaps there are hidden dependencies we
don't know of.

  2. To make it easier for later callers to use prio-queue for cases
 where it does matter (and I think git log is one of these).

If we can do it without a big performance loss (and I don't see any
reason it should be any worse than a slight bump to the constant-factor
of the logarithmic operations), it probably makes sense to.

I'll take a look at it (in fact, I already implemented something like it
once long ago in the thread I linked to earlier). My sense of taste says
it should be a stable_prio_queue implemented on top of prio_queue (i.e.,
storing pointers to the struct you mention above). That means you can
still use the unstable one if you want the (presumably minor)
performance benefit, and it keeps the logic nice and tidy.

But given that we have implemented prio_queue using void pointers, I
think it would introduce an extra pointer per item and an extra layer of
indirection on each access.  So maybe it is better to just build it in.

The low-cost alternative is to implement prio_queue to hold items of
arbitrary size. I'm not sure if that is the worth the complexity and
maintenance cost.

 Heh, I should have read the below-three-dashs commentary before
 commenting (I often start working from the commit messages in git
 log and then go back to the original thread).

I always wonder how people read those. I tend to write them as if people
have (just) read the commit message, but not yet read the patch.

-Peff

PS Thanks for your earlier comments on the actual commit-slab painting
   algorithm. Responding to those is taking more thinking, and I haven't
   gotten to it yet, but it's on my agenda.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] paint_down_to_common: use prio_queue

2014-06-25 Thread Jeff King
When we are traversing to find merge bases, we keep our
usual commit_list of commits to process, sorted by their
commit timestamp. As we add each parent to the list, we have
to spend O(width of history) to do the insertion, where
the width of history is the number of simultaneous lines of
development.

If we instead use a heap-based priority queue, we can do
these insertions in O(log width) time. This provides minor
speedups to merge-base calculations (timings in linux.git,
warm cache, best-of-five):

  [before]
  $ git merge-base HEAD v2.6.12
  real0m3.251s
  user0m3.148s
  sys 0m0.104s

  [after]
  $ git merge-base HEAD v2.6.12
  real0m3.234s
  user0m3.108s
  sys 0m0.128s

That's only an 0.5% speedup, but it does help protect us
against pathological cases.

The downside is that our priority queue is not stable, which
means that commits with the same timestamp may not come out
in the order we put them in. You can see this in the test
update in t6024. That test does a recursive merge across a
set of commits that all have the same timestamp. For the
virtual ancestor, the test currently ends up with blob like
this:

 Temporary merge branch 1
 Temporary merge branch 1
C
===
B
 Temporary merge branch 2
===
A
 Temporary merge branch 2

but with this patch, the positions of B and A are swapped.
This is probably fine, as the order is an internal
implementation detail anyway (it would _not_ be fine if we
were using a priority queue for git log traversal, which
should show commits in parent order).

While we are munging the interesting function, we also
take the opportunity to give it a more descriptive name, and
convert the return value to an int (we returned the first
interesting commit, but nobody ever looked at it).

Signed-off-by: Jeff King p...@peff.net
---
This one is not strictly required for the series; it's just that I'm
adding what is essentially a clone of paint_down_to_common later in the
series. I wanted to use the priority queue there, too, so I looked into
using it here.

I'm slightly hesitant because of the stability thing mentioned above. I
_think_ it's probably fine. But we could also implement a
stable_prio_queue on top of the existing prio_queue if we're concerned
(and that may be something we want to do anyway, because git log would
want that if it switched to a priority queue).

I had no recollection while writing this patch, but after searching the
list for stable priority queue, I realized that Junio and I discussed
it quite extensively a few years ago:

  http://thread.gmane.org/gmane.comp.version-control.git/204386/focus=204534

I think the conclusion there is that what this patch does is acceptable.

 commit.c   | 42 +++---
 t/t6024-recursive-merge.sh |  2 +-
 2 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/commit.c b/commit.c
index 881be3b..445b679 100644
--- a/commit.c
+++ b/commit.c
@@ -729,45 +729,41 @@ void sort_in_topological_order(struct commit_list **list, 
enum rev_sort_order so
 
 static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
 
-static struct commit *interesting(struct commit_list *list)
+static int queue_has_nonstale(struct prio_queue *queue)
 {
-   while (list) {
-   struct commit *commit = list-item;
-   list = list-next;
-   if (commit-object.flags  STALE)
-   continue;
-   return commit;
+   int i;
+   for (i = 0; i  queue-nr; i++) {
+   struct commit *commit = queue-array[i];
+   if (!(commit-object.flags  STALE))
+   return 1;
}
-   return NULL;
+   return 0;
 }
 
 /* all input commits in one and twos[] must have been parsed! */
 static struct commit_list *paint_down_to_common(struct commit *one, int n, 
struct commit **twos)
 {
-   struct commit_list *list = NULL;
+   struct prio_queue queue = { compare_commits_by_commit_date };
struct commit_list *result = NULL;
int i;
 
one-object.flags |= PARENT1;
-   commit_list_insert_by_date(one, list);
-   if (!n)
-   return list;
+   if (!n) {
+   commit_list_append(one, result);
+   return result;
+   }
+   prio_queue_put(queue, one);
+
for (i = 0; i  n; i++) {
twos[i]-object.flags |= PARENT2;
-   commit_list_insert_by_date(twos[i], list);
+   prio_queue_put(queue, twos[i]);
}
 
-   while (interesting(list)) {
-   struct commit *commit;
+   while (queue_has_nonstale(queue)) {
+   struct commit *commit = prio_queue_get(queue);
struct commit_list *parents;
-   struct commit_list *next;
int flags;
 
-   commit = list-item;
-   next = list-next;
-   free(list);
-   list