Re: [PATCH v2 19/21] bisect: use a bottom-up traversal to find relevant weights

2016-04-26 Thread Junio C Hamano
Stephan Beyer  writes:

>  struct commit_list *find_bisection(struct commit_list *list,
> @@ -470,8 +541,11 @@ struct commit_list *find_bisection(struct commit_list 
> *list,
>   compute_all_weights(list, weights);
>   best = best_bisection_sorted(list);
>   } else {
> + int i;
>   compute_relevant_weights(list, weights);
>   best = best_bisection(list);
> + for (i = 0; i < on_list; i++) /* cleanup */
> + free_commit_list(weights[i].children);
>   }
>   assert(best);
>   *reaches = node_data(best->item)->weight;

One thing I forgot to mention is that we now may want to reconsider
what the first loop in this function does.  It used to be that the
purpose of the loop is to "count the number of total and
tree-changing items on the list while reversing the list" as the
comment says.  While it is still necessary to count the items (by
the way, with 16/21 you made these two numbers identical, i.e. there
no longer is a separate 'total' but your 'total' now actually means
the number of tree-changing items), I do not know if the "reverse"
would still be a good fit for the performance characteristic of the
new algorithm.

The list-reversal there was done as an optimization to make sure
that older ones are processed early to avoid looping too much just
to follow the list to find a single-parent commit whose parent's
weight is already known, as the only meaningful optimization in the
original algorithm was the "we can increment one without doing the
costly graph re-traversal for single-strand-of-pearls".  That
optimization may no longer relevant (or it could even be harmful)
as you traverse the graph in reverse.





--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 19/21] bisect: use a bottom-up traversal to find relevant weights

2016-04-15 Thread Junio C Hamano
Stephan Beyer  writes:

> +static struct commit *extract_merge_to_queue(struct commit_list **merges)
> +{
> + assert(merges);
> +
> + struct commit_list *p, *q;
> + struct commit *found;
> +

"gcc -Werror -Wdecl-after-statement" will barf at this.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 19/21] bisect: use a bottom-up traversal to find relevant weights

2016-04-15 Thread Junio C Hamano
Stephan Beyer  writes:

> The idea is to reverse the DAG and perform a traversal
> starting on all sources of the reversed DAG.

Please clarify what you mean by "sources" here.  Those who read log
message in Git context would know that you mean the commit graph by
"DAG", and "reversed DAG" means "having reverse linkage that lets
you find children given a parent", so "DAG" does not need such a
clarification.

> We walk from the bottom commits, incrementing the weight while
> walking on a part of the graph that is single strand of pearls,
> or doing the "count the reachable ones the hard way" using
> compute_weight() when we hit a merge commit.

Makes sense.  So instead of "all sources", you can say "perform a
traversal starting from the bottom commits, going from parent to its
children".

> A traversal ends when the computed weight is falling or halfway.
> This way, commits with too high weight to be relevant are never
> visited (and their weights are never computed).

Yup, beautiful.

> diff --git a/bisect.c b/bisect.c
> index c6bad43..9487ba9 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -30,6 +30,9 @@ static unsigned marker;
>  struct node_data {
>   int weight;
>   unsigned marked;
> + unsigned parents;
> + unsigned visited : 1;
> + struct commit_list *children;
>  };
>  
>  #define DEBUG_BISECT 0

> +static inline void commit_list_insert_unique(struct commit *item,
> +   struct commit_list **list)
> +{
> + if (!*list || item < (*list)->item) /* empty list or item will be first 
> */
> + commit_list_insert(item, list);
> + else if (item != (*list)->item) { /* item will not be first or not 
> inserted */
> + struct commit_list *p = *list;
> + for (; p->next && p->next->item < item; p = p->next);
> + if (!p->next || item != p->next->item) /* not already inserted 
> */
> + commit_list_insert(item, >next);
> + }
> +}

H.

When you have two commits, struct commit *one, and struct commit
*two, is it safe to do a pointer comparison for ordering?

I know it would work in practice, but I am worried about language
lawyers (and possibly static analysis tools) barking at this code.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 19/21] bisect: use a bottom-up traversal to find relevant weights

2016-04-13 Thread Christian Couder
On Sun, Apr 10, 2016 at 3:19 PM, Stephan Beyer  wrote:
> The idea is to reverse the DAG and perform a traversal
> starting on all sources of the reversed DAG.
>
> We walk from the bottom commits, incrementing the weight while
> walking on a part of the graph that is single strand of pearls,
> or doing the "count the reachable ones the hard way" using
> compute_weight() when we hit a merge commit.
>
> A traversal ends when the computed weight is falling or halfway.

Yeah, it looks like it could be a good optimization to end a traversal
looking for "relevant" commits when the weight is falling.

> This way, commits with too high weight to be relevant are never
> visited (and their weights are never computed).
>
> Signed-off-by: Stephan Beyer 
> ---
>
> Notes:
> I rephrased the commit message.
>
> I renamed the functions such that they don't talk about "BFS"
> because that is irrelevant. Also use a DFS now because it is
> less code (and a little more efficient).
>
> I plugged some leaks.

That's a lot of things in just one commit.

>  bisect.c | 250 
> +--
>  1 file changed, 162 insertions(+), 88 deletions(-)

Also from the diff stats it looks like you add a lot of code in this
commit and the previous one.
I wonder why you are saying that a DFS is less code above then.

The previous patch (18/21) has the following diff stat:

> bisect.c | 116 ---
> 1 file changed, 97 insertions(+), 19 deletions(-)

And the subsequent patches don't reduce code size overall.
Diff stat for 20/21 is:

> bisect.c | 44 +++-
> 1 file changed, 19 insertions(+), 25 deletions(-)

And diff stat for 21/21 is:

> bisect.c | 18 +-
> 1 file changed, 13 insertions(+), 5 deletions(-)

So after your patches from 18/21 to 21/21 there are around 150 more
lines of code.
Maybe this is worth it, but I wonder if at least some optimizations,
like for example ending a traversal looking for "relevant" commits
when the weight is falling, could be implemented without changing the
code so much and adding so many lines.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 19/21] bisect: use a bottom-up traversal to find relevant weights

2016-04-10 Thread Stephan Beyer
The idea is to reverse the DAG and perform a traversal
starting on all sources of the reversed DAG.

We walk from the bottom commits, incrementing the weight while
walking on a part of the graph that is single strand of pearls,
or doing the "count the reachable ones the hard way" using
compute_weight() when we hit a merge commit.

A traversal ends when the computed weight is falling or halfway.
This way, commits with too high weight to be relevant are never
visited (and their weights are never computed).

Signed-off-by: Stephan Beyer 
---

Notes:
I rephrased the commit message.

I renamed the functions such that they don't talk about "BFS"
because that is irrelevant. Also use a DFS now because it is
less code (and a little more efficient).

I plugged some leaks.

 bisect.c | 250 +--
 1 file changed, 162 insertions(+), 88 deletions(-)

diff --git a/bisect.c b/bisect.c
index c6bad43..9487ba9 100644
--- a/bisect.c
+++ b/bisect.c
@@ -30,6 +30,9 @@ static unsigned marker;
 struct node_data {
int weight;
unsigned marked;
+   unsigned parents;
+   unsigned visited : 1;
+   struct commit_list *children;
 };
 
 #define DEBUG_BISECT 0
@@ -110,16 +113,6 @@ static int count_interesting_parents(struct commit *commit)
return count;
 }
 
-static inline int halfway(struct commit *commit)
-{
-   /*
-* Don't short-cut something we are not going to return!
-*/
-   if (commit->object.flags & TREESAME)
-   return 0;
-   return !distance_direction(commit);
-}
-
 #if !DEBUG_BISECT
 #define show_list(a,b,c) do { ; } while (0)
 #else
@@ -340,90 +333,168 @@ static void compute_all_weights(struct commit_list *list,
show_list("bisection 2 counted all", counted, list);
 }
 
-/* At the moment this is basically the same as compute_all_weights()
- * but with a halfway shortcut */
+static struct commit_list *build_reversed_dag(struct commit_list *list,
+ struct node_data *nodes)
+{
+   struct commit_list *sources = NULL;
+   struct commit_list *p;
+   int n = 0;
+   for (p = list; p; p = p->next)
+   p->item->util = [n++];
+   for (p = list; p; p = p->next) {
+   struct commit_list *parent;
+   struct commit *commit = p->item;
+   for (parent = commit->parents; parent; parent = parent->next) {
+   if (!(parent->item->object.flags & UNINTERESTING)) {
+   struct commit_list **next = 
_data(parent->item)->children;
+   commit_list_insert(commit, next);
+   node_data(commit)->parents++;
+   }
+   }
+   }
+
+   /* find all sources */
+   for (p = list; p; p = p->next) {
+   if (node_data(p->item)->parents == 0)
+   commit_list_insert(p->item, );
+   }
+
+   return sources;
+}
+
+static inline void commit_list_insert_unique(struct commit *item,
+ struct commit_list **list)
+{
+   if (!*list || item < (*list)->item) /* empty list or item will be first 
*/
+   commit_list_insert(item, list);
+   else if (item != (*list)->item) { /* item will not be first or not 
inserted */
+   struct commit_list *p = *list;
+   for (; p->next && p->next->item < item; p = p->next);
+   if (!p->next || item != p->next->item) /* not already inserted 
*/
+   commit_list_insert(item, >next);
+   }
+}
+
+/* do a traversal on the reversed DAG (starting from commits in queue),
+ * but stop at merge commits */
+static void traversal_up_to_merges(struct commit_list *queue,
+  struct commit_list **merges)
+{
+   assert(queue);
+   while (queue) {
+   struct commit *top = queue->item;
+   struct commit_list *p;
+
+   pop_commit();
+
+   if (distance_direction(top) > 0) {
+   node_data(top)->visited = 1;
+
+   /* queue children */
+   for (p = node_data(top)->children; p; p = p->next) {
+   if (node_data(p->item)->parents > 1) /* child 
is a merge */
+   commit_list_insert_unique(p->item, 
merges);
+   else {
+   node_data(p->item)->weight = 
node_data(top)->weight;
+   if (!(p->item->object.flags & TREESAME))
+   node_data(p->item)->weight++;
+   commit_list_insert(p->item, );
+   }
+   }
+   }
+   }
+}
+
+static inline int