Re: [PATCH 3/8] paint_down_to_common: use prio_queue
Jeff King p...@peff.net writes: The downside is that our priority queue is not stable, which means that commits with the same timestamp may not come out in the order we put them in. You can see this in the test update in t6024. That test does a recursive merge across a set of commits that all have the same timestamp. For the virtual ancestor, the test currently ends up with blob like this: Temporary merge branch 1 Temporary merge branch 1 C === B Temporary merge branch 2 === A Temporary merge branch 2 but with this patch, the positions of B and A are swapped. This is probably fine, as the order is an internal implementation detail anyway (it would _not_ be fine if we were using a priority queue for git log traversal, which should show commits in parent order). Interesting that the queue is not stable, but the test can still rely on a fixed output. While I tend to agree that for the purpose of this code path, the order is an internal implementation detail, but I wonder if it would benefit us a lot if we taught prio-queue to be optionally more stable, which would allow us to use it in other code paths that care. If we really wanted to, I would imagine that we could keep the insertion counter in the elements of the queue to make the result stable (i.e. the void **array would become something like struct { int insertion_ctr; void *thing; } *array). I'm slightly hesitant because of the stability thing mentioned above. I _think_ it's probably fine. But we could also implement a stable_prio_queue on top of the existing prio_queue if we're concerned (and that may be something we want to do anyway, because git log would want that if it switched to a priority queue). Heh, I should have read the below-three-dashs commentary before commenting (I often start working from the commit messages in git log and then go back to the original thread). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] paint_down_to_common: use prio_queue
On Tue, Jul 01, 2014 at 09:23:21AM -0700, Junio C Hamano wrote: but with this patch, the positions of B and A are swapped. This is probably fine, as the order is an internal implementation detail anyway (it would _not_ be fine if we were using a priority queue for git log traversal, which should show commits in parent order). Interesting that the queue is not stable, but the test can still rely on a fixed output. I think it is deterministic for a particular sequence of inserts/pops, but not stable with respect to insertion order. While I tend to agree that for the purpose of this code path, the order is an internal implementation detail, but I wonder if it would benefit us a lot if we taught prio-queue to be optionally more stable, which would allow us to use it in other code paths that care. If we really wanted to, I would imagine that we could keep the insertion counter in the elements of the queue to make the result stable (i.e. the void **array would become something like struct { int insertion_ctr; void *thing; } *array). Yeah, I think the reasons to be stable are: 1. To be on the safe side for operations like this where it _shouldn't_ matter, but perhaps there are hidden dependencies we don't know of. 2. To make it easier for later callers to use prio-queue for cases where it does matter (and I think git log is one of these). If we can do it without a big performance loss (and I don't see any reason it should be any worse than a slight bump to the constant-factor of the logarithmic operations), it probably makes sense to. I'll take a look at it (in fact, I already implemented something like it once long ago in the thread I linked to earlier). My sense of taste says it should be a stable_prio_queue implemented on top of prio_queue (i.e., storing pointers to the struct you mention above). That means you can still use the unstable one if you want the (presumably minor) performance benefit, and it keeps the logic nice and tidy. But given that we have implemented prio_queue using void pointers, I think it would introduce an extra pointer per item and an extra layer of indirection on each access. So maybe it is better to just build it in. The low-cost alternative is to implement prio_queue to hold items of arbitrary size. I'm not sure if that is the worth the complexity and maintenance cost. Heh, I should have read the below-three-dashs commentary before commenting (I often start working from the commit messages in git log and then go back to the original thread). I always wonder how people read those. I tend to write them as if people have (just) read the commit message, but not yet read the patch. -Peff PS Thanks for your earlier comments on the actual commit-slab painting algorithm. Responding to those is taking more thinking, and I haven't gotten to it yet, but it's on my agenda. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] paint_down_to_common: use prio_queue
When we are traversing to find merge bases, we keep our usual commit_list of commits to process, sorted by their commit timestamp. As we add each parent to the list, we have to spend O(width of history) to do the insertion, where the width of history is the number of simultaneous lines of development. If we instead use a heap-based priority queue, we can do these insertions in O(log width) time. This provides minor speedups to merge-base calculations (timings in linux.git, warm cache, best-of-five): [before] $ git merge-base HEAD v2.6.12 real0m3.251s user0m3.148s sys 0m0.104s [after] $ git merge-base HEAD v2.6.12 real0m3.234s user0m3.108s sys 0m0.128s That's only an 0.5% speedup, but it does help protect us against pathological cases. The downside is that our priority queue is not stable, which means that commits with the same timestamp may not come out in the order we put them in. You can see this in the test update in t6024. That test does a recursive merge across a set of commits that all have the same timestamp. For the virtual ancestor, the test currently ends up with blob like this: Temporary merge branch 1 Temporary merge branch 1 C === B Temporary merge branch 2 === A Temporary merge branch 2 but with this patch, the positions of B and A are swapped. This is probably fine, as the order is an internal implementation detail anyway (it would _not_ be fine if we were using a priority queue for git log traversal, which should show commits in parent order). While we are munging the interesting function, we also take the opportunity to give it a more descriptive name, and convert the return value to an int (we returned the first interesting commit, but nobody ever looked at it). Signed-off-by: Jeff King p...@peff.net --- This one is not strictly required for the series; it's just that I'm adding what is essentially a clone of paint_down_to_common later in the series. I wanted to use the priority queue there, too, so I looked into using it here. I'm slightly hesitant because of the stability thing mentioned above. I _think_ it's probably fine. But we could also implement a stable_prio_queue on top of the existing prio_queue if we're concerned (and that may be something we want to do anyway, because git log would want that if it switched to a priority queue). I had no recollection while writing this patch, but after searching the list for stable priority queue, I realized that Junio and I discussed it quite extensively a few years ago: http://thread.gmane.org/gmane.comp.version-control.git/204386/focus=204534 I think the conclusion there is that what this patch does is acceptable. commit.c | 42 +++--- t/t6024-recursive-merge.sh | 2 +- 2 files changed, 20 insertions(+), 24 deletions(-) diff --git a/commit.c b/commit.c index 881be3b..445b679 100644 --- a/commit.c +++ b/commit.c @@ -729,45 +729,41 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT); -static struct commit *interesting(struct commit_list *list) +static int queue_has_nonstale(struct prio_queue *queue) { - while (list) { - struct commit *commit = list-item; - list = list-next; - if (commit-object.flags STALE) - continue; - return commit; + int i; + for (i = 0; i queue-nr; i++) { + struct commit *commit = queue-array[i]; + if (!(commit-object.flags STALE)) + return 1; } - return NULL; + return 0; } /* all input commits in one and twos[] must have been parsed! */ static struct commit_list *paint_down_to_common(struct commit *one, int n, struct commit **twos) { - struct commit_list *list = NULL; + struct prio_queue queue = { compare_commits_by_commit_date }; struct commit_list *result = NULL; int i; one-object.flags |= PARENT1; - commit_list_insert_by_date(one, list); - if (!n) - return list; + if (!n) { + commit_list_append(one, result); + return result; + } + prio_queue_put(queue, one); + for (i = 0; i n; i++) { twos[i]-object.flags |= PARENT2; - commit_list_insert_by_date(twos[i], list); + prio_queue_put(queue, twos[i]); } - while (interesting(list)) { - struct commit *commit; + while (queue_has_nonstale(queue)) { + struct commit *commit = prio_queue_get(queue); struct commit_list *parents; - struct commit_list *next; int flags; - commit = list-item; - next = list-next; - free(list); - list