Re: segfault for git log --graph --no-walk --grep a
On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote: Yeah, that actually is a good point. We should be using logmsg_reencode so that we look for strings in the user's encoding. Perhaps like this. Just like the previous one (which should be discarded), this makes the function always use the temporary strbuf, so doing this upfront actually loses more code than it adds ;-) I didn't see this in What's Cooking or pu. We should probably pick an approach and go with it. I think using logmsg_reencode makes sense. I'd be in favor of avoiding the extra copy in the common case, so something like the patch below. If you feel strongly about the code cleanup at the minor run-time expense, I won't argue too much, though. --- diff --git a/revision.c b/revision.c index d7562ee..a08d0a5 100644 --- a/revision.c +++ b/revision.c @@ -2268,7 +2268,10 @@ static int commit_match(struct commit *commit, struct rev_info *opt) static int commit_match(struct commit *commit, struct rev_info *opt) { int retval; + const char *encoding; + const char *message; struct strbuf buf = STRBUF_INIT; + if (!opt-grep_filter.pattern_list !opt-grep_filter.header_list) return 1; @@ -2279,13 +2282,23 @@ static int commit_match(struct commit *commit, struct rev_info *opt) strbuf_addch(buf, '\n'); } + /* +* We grep in the user's output encoding, under the assumption that it +* is the encoding they are most likely to write their grep pattern +* for. In addition, it means we will match the notes encoding below, +* so we will not end up with a buffer that has two different encodings +* in it. +*/ + encoding = get_log_output_encoding(); + message = logmsg_reencode(commit, encoding); + /* Copy the commit to temporary if we are using fake headers */ if (buf.len) - strbuf_addstr(buf, commit-buffer); + strbuf_addstr(buf, message); if (opt-grep_filter.header_list opt-mailmap) { if (!buf.len) - strbuf_addstr(buf, commit-buffer); + strbuf_addstr(buf, message); commit_rewrite_person(buf, \nauthor , opt-mailmap); commit_rewrite_person(buf, \ncommitter , opt-mailmap); @@ -2294,18 +2307,18 @@ static int commit_match(struct commit *commit, struct rev_info *opt) /* Append fake message parts as needed */ if (opt-show_notes) { if (!buf.len) - strbuf_addstr(buf, commit-buffer); - format_display_notes(commit-object.sha1, buf, -get_log_output_encoding(), 1); + strbuf_addstr(buf, message); + format_display_notes(commit-object.sha1, buf, encoding, 1); } - /* Find either in the commit object, or in the temporary */ + /* Find either in the original commit message, or in the temporary */ if (buf.len) retval = grep_buffer(opt-grep_filter, buf.buf, buf.len); else retval = grep_buffer(opt-grep_filter, -commit-buffer, strlen(commit-buffer)); +message, strlen(message)); strbuf_release(buf); + logmsg_free(message, commit); return retval; } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segfault for git log --graph --no-walk --grep a
Jeff King p...@peff.net writes: On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote: Yeah, that actually is a good point. We should be using logmsg_reencode so that we look for strings in the user's encoding. Perhaps like this. Just like the previous one (which should be discarded), this makes the function always use the temporary strbuf, so doing this upfront actually loses more code than it adds ;-) I didn't see this in What's Cooking or pu. We should probably pick an approach and go with it. I think using logmsg_reencode makes sense. I'd be in favor of avoiding the extra copy in the common case, so something like the patch below. If you feel strongly about the code cleanup at the minor run-time expense, I won't argue too much, though. Sounds good to me. Care to do the log message while at it? --- diff --git a/revision.c b/revision.c index d7562ee..a08d0a5 100644 --- a/revision.c +++ b/revision.c @@ -2268,7 +2268,10 @@ static int commit_match(struct commit *commit, struct rev_info *opt) static int commit_match(struct commit *commit, struct rev_info *opt) { int retval; + const char *encoding; + const char *message; struct strbuf buf = STRBUF_INIT; + if (!opt-grep_filter.pattern_list !opt-grep_filter.header_list) return 1; @@ -2279,13 +2282,23 @@ static int commit_match(struct commit *commit, struct rev_info *opt) strbuf_addch(buf, '\n'); } + /* + * We grep in the user's output encoding, under the assumption that it + * is the encoding they are most likely to write their grep pattern + * for. In addition, it means we will match the notes encoding below, + * so we will not end up with a buffer that has two different encodings + * in it. + */ + encoding = get_log_output_encoding(); + message = logmsg_reencode(commit, encoding); + /* Copy the commit to temporary if we are using fake headers */ if (buf.len) - strbuf_addstr(buf, commit-buffer); + strbuf_addstr(buf, message); if (opt-grep_filter.header_list opt-mailmap) { if (!buf.len) - strbuf_addstr(buf, commit-buffer); + strbuf_addstr(buf, message); commit_rewrite_person(buf, \nauthor , opt-mailmap); commit_rewrite_person(buf, \ncommitter , opt-mailmap); @@ -2294,18 +2307,18 @@ static int commit_match(struct commit *commit, struct rev_info *opt) /* Append fake message parts as needed */ if (opt-show_notes) { if (!buf.len) - strbuf_addstr(buf, commit-buffer); - format_display_notes(commit-object.sha1, buf, - get_log_output_encoding(), 1); + strbuf_addstr(buf, message); + format_display_notes(commit-object.sha1, buf, encoding, 1); } - /* Find either in the commit object, or in the temporary */ + /* Find either in the original commit message, or in the temporary */ if (buf.len) retval = grep_buffer(opt-grep_filter, buf.buf, buf.len); else retval = grep_buffer(opt-grep_filter, - commit-buffer, strlen(commit-buffer)); + message, strlen(message)); strbuf_release(buf); + logmsg_free(message, commit); return retval; } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segfault for git log --graph --no-walk --grep a
Jeff King p...@peff.net writes: On Mon, Feb 11, 2013 at 12:36:52PM -0800, Junio C Hamano wrote: Junio C Hamano gits...@pobox.com writes: Jeff King p...@peff.net writes: On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote: Yeah, that actually is a good point. We should be using logmsg_reencode so that we look for strings in the user's encoding. Perhaps like this. Just like the previous one (which should be discarded), this makes the function always use the temporary strbuf, so doing this upfront actually loses more code than it adds ;-) I didn't see this in What's Cooking or pu. We should probably pick an approach and go with it. I think using logmsg_reencode makes sense. I'd be in favor of avoiding the extra copy in the common case, so something like the patch below. If you feel strongly about the code cleanup at the minor run-time expense, I won't argue too much, though. Sounds good to me. Care to do the log message while at it? Heh, how about this? I still need a sign-off from you. I'm working on the log message and tests right now. There's also a minor code fixup needed to compile with -Wall. Thanks; I noticed the constness issue around the message variable as well. log --grep: look for the given string in log output encoding We used to grep in the raw commit buffer contents, possibly pieces of notes encoded in log output encoding appended to it, which was insane. Convert the contents of the commit message also to log output encoding before looking for the string. This incidentally fixes a possible NULL dereference that can happen when commit-buffer has already been freed, which can happen with git commit -m 'text1' --allow-empty git commit -m 'text2' --allow-empty git log --graph --no-walk --grep 'text2' which arguably does not make any sense (--graph inherently wants a connected history, and by --no-walk the command line is telling us to show discrete points in history without connectivity), and we probably should forbid the combination, but that is a separate issue. I'll use bits of that. I had sort of punted on the how to reproduce the segfault issue entirely because you had noted that it was not a sane thing to do. Still, I think it makes sense to mention it with the caveat you give here. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
segfault for git log --graph --no-walk --grep a
Hallo, I just found that git crashes with a segmentation fault when calling $ git log --graph --no-walk --grep pattern It happens both for version 1.7.10.4 from Debian (wheezy,amd64) and a fresh compiled git from github (git.git repository, master). For the error to occure, the pattern must match the commit message in HEAD. How to reproduce: git init . git commit -m 'text1' --allow-empty git commit -m 'text2' --allow-empty git log --graph --no-walk --grep 'text2' And here is coredump I got: Core was generated by `git log --graph --no-walk --grep text2'. Program terminated with signal 11, Segmentation fault. #0 __strlen_sse42 () at ../sysdeps/x86_64/multiarch/strlen-sse4.S:32 #1 0x004cc13e in commit_match (opt=0x7fffbd0ee500, commit=0x216d1a8) at revision.c:2306 #2 get_commit_action (revs=0x7fffbd0ee500, commit=0x216d1a8) at revision.c:2338 #3 0x004984b4 in graph_is_interesting (commit=optimized out, graph=error reading variable: Unhandled dwarf expression opcode 0xfa) at graph.c:330 #4 0x00498569 in first_interesting_parent (graph=graph@entry=0x21629c0) at graph.c:369 #5 0x0049965e in graph_update (graph=0x21629c0, commit=optimized out) at graph.c:593 #6 0x004cc7a9 in get_revision (revs=revs@entry=0x7fffbd0ee500) at revision.c:2580 #7 0x0043988a in cmd_log_walk (rev=rev@entry=0x7fffbd0ee500) at builtin/log.c:309 #8 0x0043a398 in cmd_log (argc=9, argv=0x2162930, prefix=0x0) at builtin/log.c:582 #9 0x00405988 in run_builtin (argv=0x2162930, argc=9, p=0x751438) at git.c:281 #10 handle_internal_command (argc=9, argv=0x2162930) at git.c:443 #11 0x00404df2 in run_argv (argv=0x7fffbd0eec00, argcp=0x7fffbd0eec0c) at git.c:489 #12 main (argc=9, argv=0x2162930) at git.c:564 it happens in file revision.c:2306 because commit-buffer is zero: retval = grep_buffer(opt-grep_filter, commit-buffer, strlen(commit-buffer)); thank you all, for this awesome software. Thomas signature.asc Description: This is a digitally signed message part.
Re: segfault for git log --graph --no-walk --grep a
Thomas Haller thom...@gmail.com writes: it happens in file revision.c:2306 because commit-buffer is zero: retval = grep_buffer(opt-grep_filter, commit-buffer, strlen(commit-buffer)); I think this has been fixed at be5c9fb9049e (logmsg_reencode: lazily load missing commit buffers, 2013-01-26); I haven't merged it to any of the maintenance releases, but the tip of 'master' should have it already. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segfault for git log --graph --no-walk --grep a
Junio C Hamano gits...@pobox.com writes: Thomas Haller thom...@gmail.com writes: it happens in file revision.c:2306 because commit-buffer is zero: retval = grep_buffer(opt-grep_filter, commit-buffer, strlen(commit-buffer)); I think this has been fixed at be5c9fb9049e (logmsg_reencode: lazily load missing commit buffers, 2013-01-26); I haven't merged it to any of the maintenance releases, but the tip of 'master' should have it already. Ah, no, this shares the same roots as the breakage the said commit fixed, and the solution should use the same idea, but not fixed. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segfault for git log --graph --no-walk --grep a
On Fri, Feb 08, 2013 at 04:22:15PM -0800, Junio C Hamano wrote: Junio C Hamano gits...@pobox.com writes: Thomas Haller thom...@gmail.com writes: it happens in file revision.c:2306 because commit-buffer is zero: retval = grep_buffer(opt-grep_filter, commit-buffer, strlen(commit-buffer)); I think this has been fixed at be5c9fb9049e (logmsg_reencode: lazily load missing commit buffers, 2013-01-26); I haven't merged it to any of the maintenance releases, but the tip of 'master' should have it already. Ah, no, this shares the same roots as the breakage the said commit fixed, and the solution should use the same idea, but not fixed. Yeah, I think this needs separate treatment. However, this is a perfect example of the case-by-case I mention in the final two paragraphs there. What's the right encoding to be grepping in? The original, what we will output in, or even something else? IOW, I wonder if this should be using logmsg_reencode in the first place (the obvious reason not to want to do so is speed, but logmsg_reencode is written to only have an impact in the case that we do indeed need to reencode). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segfault for git log --graph --no-walk --grep a
Junio C Hamano gits...@pobox.com writes: Junio C Hamano gits...@pobox.com writes: Thomas Haller thom...@gmail.com writes: it happens in file revision.c:2306 because commit-buffer is zero: retval = grep_buffer(opt-grep_filter, commit-buffer, strlen(commit-buffer)); I think this has been fixed at be5c9fb9049e (logmsg_reencode: lazily load missing commit buffers, 2013-01-26); I haven't merged it to any of the maintenance releases, but the tip of 'master' should have it already. Ah, no, this shares the same roots as the breakage the said commit fixed, and the solution should use the same idea, but not fixed. Perhaps something along this line... -- 8 -- Subject: log --grep: commit's buffer may already have been discarded Following up on be5c9fb9049e (logmsg_reencode: lazily load missing commit buffers, 2013-01-26), extract the part that reads the commit buffer data into a separate helper function, and use it when we apply the grep filter on the commit during the log walk. Signed-off-by: Junio C Hamano gits...@pobox.com --- The reproduction recipe in original bug report looked like this: git commit -m 'text1' --allow-empty git commit -m 'text2' --allow-empty git log --graph --no-walk --grep 'text2' which does not make any sense to me. We should simply forbid combination of --graph (which inherently wants a connected history) and --no-walk (which is a way to tell This is not about history, this is about a single point). But that is a separate issue. commit.c | 19 +++ commit.h | 1 + pretty.c | 14 ++ revision.c | 14 +++--- 4 files changed, 33 insertions(+), 15 deletions(-) diff --git a/commit.c b/commit.c index e8eb0ae..c0acf0f 100644 --- a/commit.c +++ b/commit.c @@ -335,6 +335,25 @@ int parse_commit(struct commit *item) return ret; } +char *read_commit_object_data(const struct commit *commit, unsigned long *sizep) +{ + char *msg; + enum object_type type; + unsigned long size; + + if (!sizep) + sizep = size; + + msg = read_sha1_file(commit-object.sha1, type, sizep); + if (!msg) + die(Cannot read commit object %s, + sha1_to_hex(commit-object.sha1)); + if (type != OBJ_COMMIT) + die(Expected commit for '%s', got %s, + sha1_to_hex(commit-object.sha1), typename(type)); + return msg; +} + int find_commit_subject(const char *commit_buffer, const char **subject) { const char *eol; diff --git a/commit.h b/commit.h index 4138bb4..e314149 100644 --- a/commit.h +++ b/commit.h @@ -102,6 +102,7 @@ struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */ extern char *logmsg_reencode(const struct commit *commit, const char *output_encoding); extern void logmsg_free(char *msg, const struct commit *commit); +extern char *read_commit_object_data(const struct commit *commit, unsigned long *size); extern void get_commit_format(const char *arg, struct rev_info *); extern const char *format_subject(struct strbuf *sb, const char *msg, const char *line_separator); diff --git a/pretty.c b/pretty.c index eae57ad..004d16d 100644 --- a/pretty.c +++ b/pretty.c @@ -592,18 +592,8 @@ char *logmsg_reencode(const struct commit *commit, char *msg = commit-buffer; char *out; - if (!msg) { - enum object_type type; - unsigned long size; - - msg = read_sha1_file(commit-object.sha1, type, size); - if (!msg) - die(Cannot read commit object %s, - sha1_to_hex(commit-object.sha1)); - if (type != OBJ_COMMIT) - die(Expected commit for '%s', got %s, - sha1_to_hex(commit-object.sha1), typename(type)); - } + if (!msg) + msg = read_commit_object_data(commit, NULL); if (!output_encoding || !*output_encoding) return msg; diff --git a/revision.c b/revision.c index d7562ee..caf8ef3 100644 --- a/revision.c +++ b/revision.c @@ -2279,9 +2279,16 @@ static int commit_match(struct commit *commit, struct rev_info *opt) strbuf_addch(buf, '\n'); } - /* Copy the commit to temporary if we are using fake headers */ - if (buf.len) + if (!commit-buffer) { + /* we may not have commit-buffer */ + unsigned long size; + char *msg = read_commit_object_data(commit, size); + strbuf_add(buf, msg, size); + free(msg); + } else if (buf.len) { + /* Copy the commit to temporary if we are using fake headers */ strbuf_addstr(buf, commit-buffer); + } if (opt-grep_filter.header_list opt-mailmap) { if (!buf.len)
Re: segfault for git log --graph --no-walk --grep a
On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote: Yeah, that actually is a good point. We should be using logmsg_reencode so that we look for strings in the user's encoding. Perhaps like this. Just like the previous one (which should be discarded), this makes the function always use the temporary strbuf, so doing this upfront actually loses more code than it adds ;-) I like code simplification, but I worry a little about paying for the extra copy in the common case. I did a best-of-five git rev-list --count --grep=foo HEAD before and after your patch, though, and the difference was well within the noise. So maybe it's not worth caring about. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: segfault for git log --graph --no-walk --grep a
On Fri, Feb 08, 2013 at 08:05:24PM -0500, Jeff King wrote: On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote: Yeah, that actually is a good point. We should be using logmsg_reencode so that we look for strings in the user's encoding. Perhaps like this. Just like the previous one (which should be discarded), this makes the function always use the temporary strbuf, so doing this upfront actually loses more code than it adds ;-) I like code simplification, but I worry a little about paying for the extra copy in the common case. I did a best-of-five git rev-list --count --grep=foo HEAD before and after your patch, though, and the difference was well within the noise. So maybe it's not worth caring about. Oh, hold on, I'm incompetent. I measured the wrong build of git. Here are the timings for git.git: [before] $ best-of-five git rev-list --count --grep=foo HEAD Attempt 1: 0.503 Attempt 2: 0.5 Attempt 3: 0.501 Attempt 4: 0.502 Attempt 5: 0.5 real0m0.500s user0m0.488s sys 0m0.008s [after] $ best-of-five git rev-list --count --grep=foo HEAD Attempt 1: 0.514 Attempt 2: 0.525 Attempt 3: 0.517 Attempt 4: 0.523 Attempt 5: 0.518 real0m0.514s user0m0.480s sys 0m0.028s So not huge, but measurable. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html