Re: [PATCH v8 4/5] Implement line-history search (git log -L)
Junio C Hamano gits...@pobox.com writes: Junio C Hamano gits...@pobox.com writes: Overall, I like this better than the log --follow hack; as the revision traversal is done without any pathspec when being careful and slow (aka -M), you do not suffer from the just use a singleton pathspec globally regardless of what other history paths are being traversed limitation of log --follow. The patch series certainly is interesting. Having said that, I notice that careful and slow is just too slow to be usable even on a small tree like ours. Try running $ git log -M -L:get_name:builtin/describe.c and see how long you have to wait until you hit the first line of output. I'll dig some more. It *should* be essentially the following times taken together: $ time git log --raw -M --topo-order /dev/null real0m5.448s user0m4.599s sys 0m0.794s $ time git log -L:get_name:builtin/describe.c /dev/null real0m0.832s user0m0.796s sys 0m0.032s $ time git log -L:get_name:builtin-describe.c 81b50f3ce40^ /dev/null real0m0.489s user0m0.465s sys 0m0.022s So I'm losing a factor of about 4 somewhere, which I can't explain right now. It could be improved further if --follow was fixed, because then the first step should be much faster than diffing *all* the trees. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/5] Implement line-history search (git log -L)
Thomas Rast tr...@student.ethz.ch writes: Junio C Hamano gits...@pobox.com writes: I notice that careful and slow is just too slow to be usable even on a small tree like ours. Try running $ git log -M -L:get_name:builtin/describe.c and see how long you have to wait until you hit the first line of output. I'll dig some more. It *should* be essentially the following times taken together: $ time git log --raw -M --topo-order /dev/null real0m5.448s [...] $ time git log -L:get_name:builtin/describe.c /dev/null real0m0.832s [...] $ time git log -L:get_name:builtin-describe.c 81b50f3ce40^ /dev/null real0m0.489s [...] So I'm losing a factor of about 4 somewhere, which I can't explain right now. It seems I still don't understand half of this code. log -M --raw in the above somehow appears to use and optimize for -M100%, whereas the log -L code is currently written for general args. However, I couldn't pin down where this happens; I only know from call graph profiling[1] that log -M --raw never goes through diffcore_std(). And indeed according to the same sort of profiling, log -M -L spends most of its time within diffcore_std() unpacking blobs to find renames. Even more confusingly, try_to_follow_renames() _does_ call into diffcore_std, so there seems to be some merit to calling it after all. With the hacky patch below, I get something more reasonable: $ time ./git-log -M -L:get_name:builtin/describe.c /dev/null real0m3.794s user0m3.734s sys 0m0.045s That's compared to about 35s on my machine without the patch. It still calls diffcore_std(), but before that it discards all diff pairs except those affecting the path(s) we're interested in and any deletions (so that they can be used as rename sources). After diffcore_std() it discards all pairs that we're not interested in. [1] valgrind --tool=callgrind --trace-children=yes -- 8 -- Subject: [PATCH] WIP: speed up log -L... -M --- line-log.c | 58 ++ 1 file changed, 54 insertions(+), 4 deletions(-) diff --git a/line-log.c b/line-log.c index a74bbaf..b03cc0b 100644 --- a/line-log.c +++ b/line-log.c @@ -997,7 +997,52 @@ static void move_diff_queue(struct diff_queue_struct *dst, DIFF_QUEUE_CLEAR(src); } -static void queue_diffs(struct diff_options *opt, +static void filter_diffs_for_paths(struct line_log_data *range, int keep_deletions) +{ + int i; + struct diff_queue_struct outq; + DIFF_QUEUE_CLEAR(outq); + + /* fprintf(stderr, -- filtering:\n); */ + + for (i = 0; i diff_queued_diff.nr; i++) { + struct diff_filepair *p = diff_queued_diff.queue[i]; + struct line_log_data *rg = NULL; + /* fprintf(stderr, %-38s\t%-38s\n, (p-one ? p-one-path : none), (p-two ? p-two-path : none)); */ + if (!DIFF_FILE_VALID(p-two)) { + if (keep_deletions) + diff_q(outq, p); + else + diff_free_filepair(p); + continue; + } + for (rg = range; rg; rg = rg-next) { + if (!strcmp(rg-spec-path, p-two-path)) + break; + } + if (rg) + diff_q(outq, p); + else + diff_free_filepair(p); + } + free(diff_queued_diff.queue); + diff_queued_diff = outq; +} + +static inline int diff_might_be_rename(void) +{ + int i; + for (i = 0; i diff_queued_diff.nr; i++) + if (!DIFF_FILE_VALID(diff_queued_diff.queue[i]-one)) { + /* fprintf(stderr, diff_might_be_rename found creation of: %s\n, */ + /* diff_queued_diff.queue[i]-two-path); */ + return 1; + } + return 0; +} + +static void queue_diffs(struct line_log_data *range, + struct diff_options *opt, struct diff_queue_struct *queue, struct commit *commit, struct commit *parent) { @@ -1013,7 +1058,12 @@ static void queue_diffs(struct diff_options *opt, DIFF_QUEUE_CLEAR(diff_queued_diff); diff_tree(desc1, desc2, , opt); - diffcore_std(opt); + if (opt-detect_rename) { + filter_diffs_for_paths(range, 1); + if (diff_might_be_rename()) + diffcore_std(opt); + filter_diffs_for_paths(range, 0); + } move_diff_queue(queue, diff_queued_diff); if (tree1) @@ -1297,7 +1347,7 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c if (commit-parents) parent = commit-parents-item; - queue_diffs(rev-diffopt, queue, commit, parent); + queue_diffs(range, rev-diffopt, queue, commit,
Re: [PATCH v8 4/5] Implement line-history search (git log -L)
Thomas Rast tr...@student.ethz.ch writes: @@ -138,6 +155,11 @@ Examples This makes sense only when following a strict policy of merging all topic branches when staying on a single integration branch. +git log -L '/int main/',/^}/:main.c:: + + Shows how the function `main()` in the file 'main.c' evolved + over time. Perhaps have ^ before that int? diff --git a/line-log.c b/line-log.c index b167b00..789b7c4 100644 --- a/line-log.c +++ b/line-log.c @@ -1,6 +1,442 @@ #include git-compat-util.h +#include cache.h +#include tag.h +#include blob.h +#include tree.h +#include diff.h +#include commit.h +#include decorate.h +#include revision.h +#include xdiff-interface.h +#include strbuf.h +#include log-tree.h +#include graph.h #include line-log.h +void range_set_grow (struct range_set *rs, size_t extra) Lose the extra SP (everywhere in this file). +{ + ALLOC_GROW(rs-ranges, rs-nr + extra, rs-alloc); +} + +/* Either initialization would be fine */ +#define RANGE_SET_INIT {0} + +void range_set_init (struct range_set *rs, size_t prealloc) +{ + rs-alloc = rs-nr = 0; + rs-ranges = NULL; + if (prealloc) + range_set_grow(rs, prealloc); +} + +void range_set_release (struct range_set *rs) +{ + free(rs-ranges); + rs-alloc = rs-nr = 0; + rs-ranges = NULL; +} + +/* dst must be uninitialized! */ +void range_set_copy (struct range_set *dst, struct range_set *src) +{ + range_set_init(dst, src-nr); + memcpy(dst-ranges, src-ranges, src-nr*sizeof(struct range_set)); + dst-nr = src-nr; +} +void range_set_move (struct range_set *dst, struct range_set *src) +{ + range_set_release(dst); + dst-ranges = src-ranges; + dst-nr = src-nr; + dst-alloc = src-alloc; + src-ranges = NULL; + src-alloc = src-nr = 0; +} + +/* tack on a _new_ range _at the end_ */ +void range_set_append (struct range_set *rs, long a, long b) +{ + assert(a = b); + assert(rs-nr == 0 || rs-ranges[rs-nr-1].end = a); + range_set_grow(rs, 1); + rs-ranges[rs-nr].start = a; + rs-ranges[rs-nr].end = b; + rs-nr++; +} + +static int range_cmp (const void *_r, const void *_s) +{ + const struct range *r = _r; + const struct range *s = _s; + + /* this could be simply 'return r.start-s.start', but for the types */ + if (r-start == s-start) + return 0; + if (r-start s-start) + return -1; + return 1; +} + +/* + * Helper: In-place pass of sorting and merging the ranges in the + * range set, to re-establish the invariants after another operation + * + * NEEDSWORK currently not needed + */ +static void sort_and_merge_range_set (struct range_set *rs) +{ + int i; + int o = 1; /* output cursor */ + + qsort(rs-ranges, rs-nr, sizeof(struct range), range_cmp); + + for (i = 1; i rs-nr; i++) { + if (rs-ranges[i].start = rs-ranges[o-1].end) { + rs-ranges[o-1].end = rs-ranges[i].end; + } else { + rs-ranges[o].start = rs-ranges[i].start; + rs-ranges[o].end = rs-ranges[i].end; + o++; + } + } + assert(o = rs-nr); + rs-nr = o; +} + +/* + * Union of range sets (i.e., sets of line numbers). Used to merge + * them when searches meet at a common ancestor. + * + * This is also where the ranges are consolidated into canonical form: + * overlapping and adjacent ranges are merged, and empty ranges are + * removed. + */ +static void range_set_union (struct range_set *out, + struct range_set *a, struct range_set *b) +{ + int i = 0, j = 0, o = 0; + struct range *ra = a-ranges; + struct range *rb = b-ranges; + /* cannot make an alias of out-ranges: it may change during grow */ + + assert(out-nr == 0); + while (i a-nr || j b-nr) { + struct range *new; + if (i a-nr j b-nr) { + if (ra[i].start rb[j].start) + new = ra[i++]; + else if (ra[i].start rb[j].start) + new = rb[j++]; + else if (ra[i].end rb[j].end) + new = ra[i++]; + else + new = rb[j++]; + } else if (i a-nr) /* b exhausted */ + new = ra[i++]; + else /* a exhausted */ + new = rb[j++]; + if (new-start == new-end) + ; /* empty range */ + else if (!o || out-ranges[o-1].end new-start) { + range_set_grow(out, 1); + out-ranges[o].start = new-start; + out-ranges[o].end = new-end; + o++; + } else if
Re: [PATCH v8 4/5] Implement line-history search (git log -L)
Junio C Hamano gits...@pobox.com writes: +/* + * NEEDSWORK: manually building a diff here is not the Right + * Thing(tm). log -L should be built into the diff pipeline. I am not sure about this design, and do not necessarily agree that wedging this to the diff pipeline is the right future direction. I have a feeling that log -L should actually be built around blame. You let blame to hit the first parent to take the blame, and then turn around to show a single diff-tree between the child and the parent with whatever other diff pipeline gizmo the user can give you from the command line. The blame also tells you what the interesting line range were at the first parent commit you found, so you can re-run the same thing with an updated range. Hrm, now that you mention it, this is actually a brilliant idea. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/5] Implement line-history search (git log -L)
Thomas Rast tr...@student.ethz.ch writes: Junio C Hamano gits...@pobox.com writes: +/* + * NEEDSWORK: manually building a diff here is not the Right + * Thing(tm). log -L should be built into the diff pipeline. I am not sure about this design, and do not necessarily agree that wedging this to the diff pipeline is the right future direction. I have a feeling that log -L should actually be built around blame. You let blame to hit the first parent to take the blame, and then turn around to show a single diff-tree between the child and the parent with whatever other diff pipeline gizmo the user can give you from the command line. The blame also tells you what the interesting line range were at the first parent commit you found, so you can re-run the same thing with an updated range. Hrm, now that you mention it, this is actually a brilliant idea. I don't know. That is just me handwaving without giving a serious end-to-end thought. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/5] Implement line-history search (git log -L)
Junio C Hamano gits...@pobox.com writes: Thomas Rast tr...@student.ethz.ch writes: Junio C Hamano gits...@pobox.com writes: +/* + * NEEDSWORK: manually building a diff here is not the Right + * Thing(tm). log -L should be built into the diff pipeline. I am not sure about this design, and do not necessarily agree that wedging this to the diff pipeline is the right future direction. I have a feeling that log -L should actually be built around blame. You let blame to hit the first parent to take the blame, and then turn around to show a single diff-tree between the child and the parent with whatever other diff pipeline gizmo the user can give you from the command line. The blame also tells you what the interesting line range were at the first parent commit you found, so you can re-run the same thing with an updated range. Hrm, now that you mention it, this is actually a brilliant idea. I don't know. That is just me handwaving without giving a serious end-to-end thought. Having thought about it for some time, I think we need to figure out if/what can be shared. I can't shake off the feeling that *something* should be common between blame and log -L, but I can't exactly say what so far. Your suggestion of looking at the first blame hit is almost there. (And in fact if it did work, it should be rather easy to prototype from blame --incremental.) However, it works only for additions, not removals. Lines that were removed do not show up in blame at all, and since a patch can also _only_ remove lines, blame would not find it even if we adjust the blamed range at every found commit. It may be possible to fix that by doing a reverse blame, but I suspect that runs into yet more trouble when trying to reverse-blame from two different sides of history. Then there's a different issue if the order of code flips. Suppose you have A1 A2 B1 B2 and later change to B1 B2 A1 A2 Diffs can fundamentally not express this; they'll only see one side as unchanged, and the other as completely new. Ideally we'd be able to track this case correctly -- as blame does? -- no matter which part is within our tracked range. Anyway, I believe this should be booked under future improvements. I've had it in my own tree for ages and it already does the right thing most of the time :-) -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html