Re: [PATCH 2/2] combine-diff: speed it up, by using multiparent diff tree-walker directly

2014-02-14 Thread Kirill Smelkov
On Thu, Feb 13, 2014 at 11:55:08AM -0800, Junio C Hamano wrote:
> Kirill Smelkov  writes:
> 
> > +   if (need_generic_pathscan) {
> > +   /* NOTE generic case also handles --stat, as it computes
> > +* diff(sha1,parent_i) for all i to do the job, specifically
> > +* for parent0.
> > +*/
> > +   paths = find_paths_generic(sha1, parents, &diffopts);
> > +   }
> > +   else {
> > +   paths = find_paths_multitree(sha1, parents, &diffopts);
> > +
> > +   /* show stat against the first parent even
> > +* when doing combined diff.
> > +*/
> > +   int stat_opt = (opt->output_format &
> > +   (DIFF_FORMAT_NUMSTAT|DIFF_FORMAT_DIFFSTAT));
> 
> /*
>  * We see decl-after-stmt here.
>  * Also please have slash-asterisk and asterisk-slash
>  * at the beginning and the end of a multi-line comment
>  * block on their own line.
>  */

Sorry, and thanks for noticing. I usually compile with -Wall, but it
seems it is not enough without explicitly specifying -std=c89.

Comments corrected and the decl-after-stmt fixed, and this time I've
compiled with `-std=c89 -pedantic -Wall -Wextra` to assure no new
warnings are introduced.

Please apply and thanks beforehand,
Kirill

 8< 
From: Kirill Smelkov 
Subject: [PATCH v2 2/2] combine-diff: speed it up, by using multiparent diff
 tree-walker directly

As was recently shown (c839f1bd "combine-diff: optimize
combine_diff_path sets intersection"), combine-diff runs very slowly. In
that commit we optimized paths sets intersection, but that accounted
only for ~ 25% of the slowness, and as my tracing showed, for linux.git
v3.10..v3.11, for merges a lot of time is spent computing
diff(commit,commit^2) just to only then intersect that huge diff to
almost small set of files from diff(commit,commit^1).

In previous commit, we described the problem in more details, and
reworked the diff tree-walker to be general one - i.e. to work in
multiple parent case too. Now is the time to take advantage of it for
finding paths for combine diff.

The implementation is straightforward - if we know, we can get generated
diff paths directly, and at present that means no diff filtering or
rename/copy detection was requested(*), we can call multiparent tree-walker
directly and get ready paths.

(*) because e.g. at present, all diffcore transformations work on
diff_filepair queues, but in the future, that limitation can be
lifted, if filters would operate directly on combine_diff_paths.

Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log")
and with `-c` ("git log -c") and with `-c --merges` ("git log -c --merges")
before and after the patch are as follows:

linux.git v3.10..v3.11

log log -c log -c --merges

before  1.9s16.4s  15.2s
after   1.9s 2.4s   1.1s

The result stayed the same.

Signed-off-by: Kirill Smelkov 
---
 combine-diff.c | 88 ++
 diff.c |  1 +
 2 files changed, 84 insertions(+), 5 deletions(-)

Chages since v1:

- fixed declaration-after-statement, and reworked multiline comments to
  start and end with /* and */ on separate lines.

diff --git a/combine-diff.c b/combine-diff.c
index 1732dfd..12764fb 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1303,7 +1303,7 @@ static const char *path_path(void *obj)
 
 
 /* find set of paths that every parent touches */
-static struct combine_diff_path *find_paths(const unsigned char *sha1,
+static struct combine_diff_path *find_paths_generic(const unsigned char *sha1,
const struct sha1_array *parents, struct diff_options *opt)
 {
struct combine_diff_path *paths = NULL;
@@ -1316,6 +1316,7 @@ static struct combine_diff_path *find_paths(const 
unsigned char *sha1,
/* tell diff_tree to emit paths in sorted (=tree) order */
opt->orderfile = NULL;
 
+   /* D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)  (wrt paths) */
for (i = 0; i < num_parent; i++) {
/*
 * show stat against the first parent even when doing
@@ -1346,6 +1347,35 @@ static struct combine_diff_path *find_paths(const 
unsigned char *sha1,
 }
 
 
+/*
+ * find set of paths that everybody touches, assuming diff is run without
+ * rename/copy detection, etc, comparing all trees simultaneously (= faster).
+ */
+static struct combine_diff_path *find_paths_multitree(
+   const unsigned char *sha1, const struct sha1_array *parents,
+   struct diff_options *opt)
+{
+   int i, nparent = parents->nr;
+   const unsigned char **parents_sha1;
+   struct combine_diff_path paths_head;
+   struct strbuf base;
+
+   parents_sha1 = xmalloc(nparent * sizeof(parents_sha1[0]));
+   for (i = 0; i < nparent; i++)
+   parents_sha1[i] = parents->sha1[i];
+
+   /* fake list head, so worker can assume it is non-NULL */
+

Re: [PATCH 2/2] combine-diff: speed it up, by using multiparent diff tree-walker directly

2014-02-13 Thread Junio C Hamano
Kirill Smelkov  writes:

> + if (need_generic_pathscan) {
> + /* NOTE generic case also handles --stat, as it computes
> +  * diff(sha1,parent_i) for all i to do the job, specifically
> +  * for parent0.
> +  */
> + paths = find_paths_generic(sha1, parents, &diffopts);
> + }
> + else {
> + paths = find_paths_multitree(sha1, parents, &diffopts);
> +
> + /* show stat against the first parent even
> +  * when doing combined diff.
> +  */
> + int stat_opt = (opt->output_format &
> + (DIFF_FORMAT_NUMSTAT|DIFF_FORMAT_DIFFSTAT));

/*
 * We see decl-after-stmt here.
 * Also please have slash-asterisk and asterisk-slash
 * at the beginning and the end of a multi-line comment
 * block on their own line.
 */
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] combine-diff: speed it up, by using multiparent diff tree-walker directly

2014-02-13 Thread Kirill Smelkov
As was recently shown (c839f1bd "combine-diff: optimize
combine_diff_path sets intersection"), combine-diff runs very slowly. In
that commit we optimized paths sets intersection, but that accounted
only for ~ 25% of the slowness, and as my tracing showed, for linux.git
v3.10..v3.11, for merges a lot of time is spent computing
diff(commit,commit^2) just to only then intersect that huge diff to
almost small set of files from diff(commit,commit^1).

In previous commit, we described the problem in more details, and
reworked the diff tree-walker to be general one - i.e. to work in
multiple parent case too. Now is the time to take advantage of it for
finding paths for combine diff.

The implementation is straightforward - if we know, we can get generated
diff paths directly, and at present that means no diff filtering or
rename/copy detection was requested(*), we can call multiparent tree-walker
directly and get ready paths.

(*) because e.g. at present, all diffcore transformations work on
diff_filepair queues, but in the future, that limitation can be
lifted, if filters would operate directly on combine_diff_paths.

Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log")
and with `-c` ("git log -c") and with `-c --merges` ("git log -c --merges")
before and after the patch are as follows:

linux.git v3.10..v3.11

log log -c log -c --merges

before  1.9s16.4s  15.2s
after   1.9s 2.4s   1.1s

The result stayed the same.

Signed-off-by: Kirill Smelkov 
---
 combine-diff.c | 85 ++
 diff.c |  1 +
 2 files changed, 81 insertions(+), 5 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 1732dfd..ddf7495 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1303,7 +1303,7 @@ static const char *path_path(void *obj)
 
 
 /* find set of paths that every parent touches */
-static struct combine_diff_path *find_paths(const unsigned char *sha1,
+static struct combine_diff_path *find_paths_generic(const unsigned char *sha1,
const struct sha1_array *parents, struct diff_options *opt)
 {
struct combine_diff_path *paths = NULL;
@@ -1316,6 +1316,7 @@ static struct combine_diff_path *find_paths(const 
unsigned char *sha1,
/* tell diff_tree to emit paths in sorted (=tree) order */
opt->orderfile = NULL;
 
+   /* D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)  (wrt paths) */
for (i = 0; i < num_parent; i++) {
/*
 * show stat against the first parent even when doing
@@ -1346,6 +1347,35 @@ static struct combine_diff_path *find_paths(const 
unsigned char *sha1,
 }
 
 
+/*
+ * find set of paths that everybody touches, assuming diff is run without
+ * rename/copy detection, etc, comparing all trees simultaneously (= faster).
+ */
+static struct combine_diff_path *find_paths_multitree(
+   const unsigned char *sha1, const struct sha1_array *parents,
+   struct diff_options *opt)
+{
+   int i, nparent = parents->nr;
+   const unsigned char **parents_sha1;
+   struct combine_diff_path paths_head;
+   struct strbuf base;
+
+   parents_sha1 = xmalloc(nparent * sizeof(parents_sha1[0]));
+   for (i = 0; i < nparent; i++)
+   parents_sha1[i] = parents->sha1[i];
+
+   /* fake list head, so worker can assume it is non-NULL */
+   paths_head.next = NULL;
+
+   strbuf_init(&base, PATH_MAX);
+   diff_tree_paths(&paths_head, sha1, parents_sha1, nparent, &base, opt);
+
+   strbuf_release(&base);
+   free(parents_sha1);
+   return paths_head.next;
+}
+
+
 void diff_tree_combined(const unsigned char *sha1,
const struct sha1_array *parents,
int dense,
@@ -1355,6 +1385,7 @@ void diff_tree_combined(const unsigned char *sha1,
struct diff_options diffopts;
struct combine_diff_path *p, *paths;
int i, num_paths, needsep, show_log_first, num_parent = parents->nr;
+   int need_generic_pathscan;
 
/* nothing to do, if no parents */
if (!num_parent)
@@ -1377,11 +1408,55 @@ void diff_tree_combined(const unsigned char *sha1,
 
/* find set of paths that everybody touches
 *
-* NOTE find_paths() also handles --stat, as it computes
-* diff(sha1,parent_i) for all i to do the job, specifically
-* for parent0.
+* NOTE
+*
+* Diffcore transformations are bound to diff_filespec and logic
+* comparing two entries - i.e. they do not apply directly to combine
+* diff.
+*
+* If some of such transformations is requested - we launch generic
+* path scanning, which works significantly slower compared to
+* simultaneous all-trees-in-one-go scan in find_paths_multitree().
+*
+* TODO some of the filters could be ported to work on
+* combine_diff_paths - i.e. all