Re: segfault for git log --graph --no-walk --grep a

2013-02-11 Thread Jeff King
On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote:

  Yeah, that actually is a good point.  We should be using logmsg_reencode
  so that we look for strings in the user's encoding.
 
 Perhaps like this.  Just like the previous one (which should be
 discarded), this makes the function always use the temporary strbuf,
 so doing this upfront actually loses more code than it adds ;-)

I didn't see this in What's Cooking or pu. We should probably pick an
approach and go with it.

I think using logmsg_reencode makes sense. I'd be in favor of avoiding
the extra copy in the common case, so something like the patch below. If
you feel strongly about the code cleanup at the minor run-time expense,
I won't argue too much, though.

---
diff --git a/revision.c b/revision.c
index d7562ee..a08d0a5 100644
--- a/revision.c
+++ b/revision.c
@@ -2268,7 +2268,10 @@ static int commit_match(struct commit *commit, struct 
rev_info *opt)
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
int retval;
+   const char *encoding;
+   const char *message;
struct strbuf buf = STRBUF_INIT;
+
if (!opt-grep_filter.pattern_list  !opt-grep_filter.header_list)
return 1;
 
@@ -2279,13 +2282,23 @@ static int commit_match(struct commit *commit, struct 
rev_info *opt)
strbuf_addch(buf, '\n');
}
 
+   /*
+* We grep in the user's output encoding, under the assumption that it
+* is the encoding they are most likely to write their grep pattern
+* for. In addition, it means we will match the notes encoding below,
+* so we will not end up with a buffer that has two different encodings
+* in it.
+*/
+   encoding = get_log_output_encoding();
+   message = logmsg_reencode(commit, encoding);
+
/* Copy the commit to temporary if we are using fake headers */
if (buf.len)
-   strbuf_addstr(buf, commit-buffer);
+   strbuf_addstr(buf, message);
 
if (opt-grep_filter.header_list  opt-mailmap) {
if (!buf.len)
-   strbuf_addstr(buf, commit-buffer);
+   strbuf_addstr(buf, message);
 
commit_rewrite_person(buf, \nauthor , opt-mailmap);
commit_rewrite_person(buf, \ncommitter , opt-mailmap);
@@ -2294,18 +2307,18 @@ static int commit_match(struct commit *commit, struct 
rev_info *opt)
/* Append fake message parts as needed */
if (opt-show_notes) {
if (!buf.len)
-   strbuf_addstr(buf, commit-buffer);
-   format_display_notes(commit-object.sha1, buf,
-get_log_output_encoding(), 1);
+   strbuf_addstr(buf, message);
+   format_display_notes(commit-object.sha1, buf, encoding, 1);
}
 
-   /* Find either in the commit object, or in the temporary */
+   /* Find either in the original commit message, or in the temporary */
if (buf.len)
retval = grep_buffer(opt-grep_filter, buf.buf, buf.len);
else
retval = grep_buffer(opt-grep_filter,
-commit-buffer, strlen(commit-buffer));
+message, strlen(message));
strbuf_release(buf);
+   logmsg_free(message, commit);
return retval;
 }
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segfault for git log --graph --no-walk --grep a

2013-02-11 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote:

  Yeah, that actually is a good point.  We should be using logmsg_reencode
  so that we look for strings in the user's encoding.
 
 Perhaps like this.  Just like the previous one (which should be
 discarded), this makes the function always use the temporary strbuf,
 so doing this upfront actually loses more code than it adds ;-)

 I didn't see this in What's Cooking or pu. We should probably pick an
 approach and go with it.

 I think using logmsg_reencode makes sense. I'd be in favor of avoiding
 the extra copy in the common case, so something like the patch below. If
 you feel strongly about the code cleanup at the minor run-time expense,
 I won't argue too much, though.

Sounds good to me.  Care to do the log message while at it?

 ---
 diff --git a/revision.c b/revision.c
 index d7562ee..a08d0a5 100644
 --- a/revision.c
 +++ b/revision.c
 @@ -2268,7 +2268,10 @@ static int commit_match(struct commit *commit, struct 
 rev_info *opt)
  static int commit_match(struct commit *commit, struct rev_info *opt)
  {
   int retval;
 + const char *encoding;
 + const char *message;
   struct strbuf buf = STRBUF_INIT;
 +
   if (!opt-grep_filter.pattern_list  !opt-grep_filter.header_list)
   return 1;
  
 @@ -2279,13 +2282,23 @@ static int commit_match(struct commit *commit, struct 
 rev_info *opt)
   strbuf_addch(buf, '\n');
   }
  
 + /*
 +  * We grep in the user's output encoding, under the assumption that it
 +  * is the encoding they are most likely to write their grep pattern
 +  * for. In addition, it means we will match the notes encoding below,
 +  * so we will not end up with a buffer that has two different encodings
 +  * in it.
 +  */
 + encoding = get_log_output_encoding();
 + message = logmsg_reencode(commit, encoding);
 +
   /* Copy the commit to temporary if we are using fake headers */
   if (buf.len)
 - strbuf_addstr(buf, commit-buffer);
 + strbuf_addstr(buf, message);
  
   if (opt-grep_filter.header_list  opt-mailmap) {
   if (!buf.len)
 - strbuf_addstr(buf, commit-buffer);
 + strbuf_addstr(buf, message);
  
   commit_rewrite_person(buf, \nauthor , opt-mailmap);
   commit_rewrite_person(buf, \ncommitter , opt-mailmap);
 @@ -2294,18 +2307,18 @@ static int commit_match(struct commit *commit, struct 
 rev_info *opt)
   /* Append fake message parts as needed */
   if (opt-show_notes) {
   if (!buf.len)
 - strbuf_addstr(buf, commit-buffer);
 - format_display_notes(commit-object.sha1, buf,
 -  get_log_output_encoding(), 1);
 + strbuf_addstr(buf, message);
 + format_display_notes(commit-object.sha1, buf, encoding, 1);
   }
  
 - /* Find either in the commit object, or in the temporary */
 + /* Find either in the original commit message, or in the temporary */
   if (buf.len)
   retval = grep_buffer(opt-grep_filter, buf.buf, buf.len);
   else
   retval = grep_buffer(opt-grep_filter,
 -  commit-buffer, strlen(commit-buffer));
 +  message, strlen(message));
   strbuf_release(buf);
 + logmsg_free(message, commit);
   return retval;
  }
  
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segfault for git log --graph --no-walk --grep a

2013-02-11 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Mon, Feb 11, 2013 at 12:36:52PM -0800, Junio C Hamano wrote:

 Junio C Hamano gits...@pobox.com writes:
 
  Jeff King p...@peff.net writes:
 
  On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote:
 
   Yeah, that actually is a good point.  We should be using 
   logmsg_reencode
   so that we look for strings in the user's encoding.
  
  Perhaps like this.  Just like the previous one (which should be
  discarded), this makes the function always use the temporary strbuf,
  so doing this upfront actually loses more code than it adds ;-)
 
  I didn't see this in What's Cooking or pu. We should probably pick an
  approach and go with it.
 
  I think using logmsg_reencode makes sense. I'd be in favor of avoiding
  the extra copy in the common case, so something like the patch below. If
  you feel strongly about the code cleanup at the minor run-time expense,
  I won't argue too much, though.
 
  Sounds good to me.  Care to do the log message while at it?
 
 Heh, how about this?  I still need a sign-off from you.

 I'm working on the log message and tests right now. There's also a minor
 code fixup needed to compile with -Wall.

Thanks; I noticed the constness issue around the message variable as well.


 log --grep: look for the given string in log output encoding
 
 We used to grep in the raw commit buffer contents, possibly pieces
 of notes encoded in log output encoding appended to it, which was
 insane.
 
 Convert the contents of the commit message also to log output
 encoding before looking for the string.  This incidentally fixes a
 possible NULL dereference that can happen when commit-buffer has
 already been freed, which can happen with
 
  git commit -m 'text1' --allow-empty
  git commit -m 'text2' --allow-empty
  git log --graph --no-walk --grep 'text2'
 
 which arguably does not make any sense (--graph inherently wants a
 connected history, and by --no-walk the command line is telling us
 to show discrete points in history without connectivity), and we
 probably should forbid the combination, but that is a separate
 issue.

 I'll use bits of that. I had sort of punted on the how to reproduce the
 segfault issue entirely because you had noted that it was not a sane
 thing to do. Still, I think it makes sense to mention it with the caveat
 you give here.

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


segfault for git log --graph --no-walk --grep a

2013-02-08 Thread Thomas Haller
Hallo,


I just found that git crashes with a segmentation fault when calling
$ git log --graph --no-walk --grep pattern

It happens both for version 1.7.10.4 from Debian (wheezy,amd64) and
a fresh compiled git from github (git.git repository, master).


For the error to occure, the pattern must match the commit message in HEAD.




How to reproduce:
  git init .
  git commit -m 'text1' --allow-empty
  git commit -m 'text2' --allow-empty
  git log --graph --no-walk --grep 'text2'



And here is coredump I got:

Core was generated by `git log --graph --no-walk --grep text2'.
Program terminated with signal 11, Segmentation fault.
#0  __strlen_sse42 () at ../sysdeps/x86_64/multiarch/strlen-sse4.S:32
#1  0x004cc13e in commit_match (opt=0x7fffbd0ee500, commit=0x216d1a8) 
at revision.c:2306
#2  get_commit_action (revs=0x7fffbd0ee500, commit=0x216d1a8) at revision.c:2338
#3  0x004984b4 in graph_is_interesting (commit=optimized out, 
graph=error reading variable: Unhandled dwarf expression opcode 0xfa) at 
graph.c:330
#4  0x00498569 in first_interesting_parent 
(graph=graph@entry=0x21629c0) at graph.c:369
#5  0x0049965e in graph_update (graph=0x21629c0, commit=optimized 
out) at graph.c:593
#6  0x004cc7a9 in get_revision (revs=revs@entry=0x7fffbd0ee500) at 
revision.c:2580
#7  0x0043988a in cmd_log_walk (rev=rev@entry=0x7fffbd0ee500) at 
builtin/log.c:309
#8  0x0043a398 in cmd_log (argc=9, argv=0x2162930, prefix=0x0) at 
builtin/log.c:582
#9  0x00405988 in run_builtin (argv=0x2162930, argc=9, p=0x751438) at 
git.c:281
#10 handle_internal_command (argc=9, argv=0x2162930) at git.c:443
#11 0x00404df2 in run_argv (argv=0x7fffbd0eec00, argcp=0x7fffbd0eec0c) 
at git.c:489
#12 main (argc=9, argv=0x2162930) at git.c:564


it happens in file revision.c:2306 because commit-buffer is zero:

retval = grep_buffer(opt-grep_filter,
 commit-buffer, strlen(commit-buffer));





thank you all, for this awesome software.
Thomas


signature.asc
Description: This is a digitally signed message part.


Re: segfault for git log --graph --no-walk --grep a

2013-02-08 Thread Junio C Hamano
Thomas Haller thom...@gmail.com writes:

 it happens in file revision.c:2306 because commit-buffer is zero:

 retval = grep_buffer(opt-grep_filter,
  commit-buffer, strlen(commit-buffer));

I think this has been fixed at be5c9fb9049e (logmsg_reencode: lazily
load missing commit buffers, 2013-01-26); I haven't merged it to any
of the maintenance releases, but the tip of 'master' should have it
already.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segfault for git log --graph --no-walk --grep a

2013-02-08 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Thomas Haller thom...@gmail.com writes:

 it happens in file revision.c:2306 because commit-buffer is zero:

 retval = grep_buffer(opt-grep_filter,
  commit-buffer, strlen(commit-buffer));

 I think this has been fixed at be5c9fb9049e (logmsg_reencode: lazily
 load missing commit buffers, 2013-01-26); I haven't merged it to any
 of the maintenance releases, but the tip of 'master' should have it
 already.

Ah, no, this shares the same roots as the breakage the said commit
fixed, and the solution should use the same idea, but not fixed.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segfault for git log --graph --no-walk --grep a

2013-02-08 Thread Jeff King
On Fri, Feb 08, 2013 at 04:22:15PM -0800, Junio C Hamano wrote:

 Junio C Hamano gits...@pobox.com writes:
 
  Thomas Haller thom...@gmail.com writes:
 
  it happens in file revision.c:2306 because commit-buffer is zero:
 
  retval = grep_buffer(opt-grep_filter,
   commit-buffer, 
  strlen(commit-buffer));
 
  I think this has been fixed at be5c9fb9049e (logmsg_reencode: lazily
  load missing commit buffers, 2013-01-26); I haven't merged it to any
  of the maintenance releases, but the tip of 'master' should have it
  already.
 
 Ah, no, this shares the same roots as the breakage the said commit
 fixed, and the solution should use the same idea, but not fixed.

Yeah, I think this needs separate treatment. However, this is a perfect
example of the case-by-case I mention in the final two paragraphs
there.

What's the right encoding to be grepping in? The original, what we will
output in, or even something else? IOW, I wonder if this should be using
logmsg_reencode in the first place (the obvious reason not to want to do
so is speed, but logmsg_reencode is written to only have an impact in
the case that we do indeed need to reencode).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segfault for git log --graph --no-walk --grep a

2013-02-08 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Junio C Hamano gits...@pobox.com writes:

 Thomas Haller thom...@gmail.com writes:

 it happens in file revision.c:2306 because commit-buffer is zero:

 retval = grep_buffer(opt-grep_filter,
  commit-buffer, 
 strlen(commit-buffer));

 I think this has been fixed at be5c9fb9049e (logmsg_reencode: lazily
 load missing commit buffers, 2013-01-26); I haven't merged it to any
 of the maintenance releases, but the tip of 'master' should have it
 already.

 Ah, no, this shares the same roots as the breakage the said commit
 fixed, and the solution should use the same idea, but not fixed.

Perhaps something along this line...

-- 8 --
Subject: log --grep: commit's buffer may already have been discarded

Following up on be5c9fb9049e (logmsg_reencode: lazily load missing
commit buffers, 2013-01-26), extract the part that reads the commit
buffer data into a separate helper function, and use it when we
apply the grep filter on the commit during the log walk.

Signed-off-by: Junio C Hamano gits...@pobox.com
---

 The reproduction recipe in original bug report looked like this:

  git commit -m 'text1' --allow-empty
  git commit -m 'text2' --allow-empty
  git log --graph --no-walk --grep 'text2'

 which does not make any sense to me.  We should simply forbid
 combination of --graph (which inherently wants a connected history)
 and --no-walk (which is a way to tell This is not about history,
 this is about a single point).

 But that is a separate issue.

 commit.c   | 19 +++
 commit.h   |  1 +
 pretty.c   | 14 ++
 revision.c | 14 +++---
 4 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/commit.c b/commit.c
index e8eb0ae..c0acf0f 100644
--- a/commit.c
+++ b/commit.c
@@ -335,6 +335,25 @@ int parse_commit(struct commit *item)
return ret;
 }
 
+char *read_commit_object_data(const struct commit *commit, unsigned long 
*sizep)
+{
+   char *msg;
+   enum object_type type;
+   unsigned long size;
+
+   if (!sizep)
+   sizep = size;
+
+   msg = read_sha1_file(commit-object.sha1, type, sizep);
+   if (!msg)
+   die(Cannot read commit object %s,
+   sha1_to_hex(commit-object.sha1));
+   if (type != OBJ_COMMIT)
+   die(Expected commit for '%s', got %s,
+   sha1_to_hex(commit-object.sha1), typename(type));
+   return msg;
+}
+
 int find_commit_subject(const char *commit_buffer, const char **subject)
 {
const char *eol;
diff --git a/commit.h b/commit.h
index 4138bb4..e314149 100644
--- a/commit.h
+++ b/commit.h
@@ -102,6 +102,7 @@ struct rev_info; /* in revision.h, it circularly uses enum 
cmit_fmt */
 extern char *logmsg_reencode(const struct commit *commit,
 const char *output_encoding);
 extern void logmsg_free(char *msg, const struct commit *commit);
+extern char *read_commit_object_data(const struct commit *commit, unsigned 
long *size);
 extern void get_commit_format(const char *arg, struct rev_info *);
 extern const char *format_subject(struct strbuf *sb, const char *msg,
  const char *line_separator);
diff --git a/pretty.c b/pretty.c
index eae57ad..004d16d 100644
--- a/pretty.c
+++ b/pretty.c
@@ -592,18 +592,8 @@ char *logmsg_reencode(const struct commit *commit,
char *msg = commit-buffer;
char *out;
 
-   if (!msg) {
-   enum object_type type;
-   unsigned long size;
-
-   msg = read_sha1_file(commit-object.sha1, type, size);
-   if (!msg)
-   die(Cannot read commit object %s,
-   sha1_to_hex(commit-object.sha1));
-   if (type != OBJ_COMMIT)
-   die(Expected commit for '%s', got %s,
-   sha1_to_hex(commit-object.sha1), typename(type));
-   }
+   if (!msg)
+   msg = read_commit_object_data(commit, NULL);
 
if (!output_encoding || !*output_encoding)
return msg;
diff --git a/revision.c b/revision.c
index d7562ee..caf8ef3 100644
--- a/revision.c
+++ b/revision.c
@@ -2279,9 +2279,16 @@ static int commit_match(struct commit *commit, struct 
rev_info *opt)
strbuf_addch(buf, '\n');
}
 
-   /* Copy the commit to temporary if we are using fake headers */
-   if (buf.len)
+   if (!commit-buffer) {
+   /* we may not have commit-buffer */
+   unsigned long size;
+   char *msg = read_commit_object_data(commit, size);
+   strbuf_add(buf, msg, size);
+   free(msg);
+   } else if (buf.len) {
+   /* Copy the commit to temporary if we are using fake headers 
*/
strbuf_addstr(buf, commit-buffer);
+   }
 
if (opt-grep_filter.header_list  opt-mailmap) {
if (!buf.len)

Re: segfault for git log --graph --no-walk --grep a

2013-02-08 Thread Jeff King
On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote:

  Yeah, that actually is a good point.  We should be using logmsg_reencode
  so that we look for strings in the user's encoding.
 
 Perhaps like this.  Just like the previous one (which should be
 discarded), this makes the function always use the temporary strbuf,
 so doing this upfront actually loses more code than it adds ;-)

I like code simplification, but I worry a little about paying for the
extra copy in the common case. I did a best-of-five git rev-list
--count --grep=foo HEAD before and after your patch, though, and the
difference was well within the noise. So maybe it's not worth caring
about.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segfault for git log --graph --no-walk --grep a

2013-02-08 Thread Jeff King
On Fri, Feb 08, 2013 at 08:05:24PM -0500, Jeff King wrote:

 On Fri, Feb 08, 2013 at 04:47:01PM -0800, Junio C Hamano wrote:
 
   Yeah, that actually is a good point.  We should be using logmsg_reencode
   so that we look for strings in the user's encoding.
  
  Perhaps like this.  Just like the previous one (which should be
  discarded), this makes the function always use the temporary strbuf,
  so doing this upfront actually loses more code than it adds ;-)
 
 I like code simplification, but I worry a little about paying for the
 extra copy in the common case. I did a best-of-five git rev-list
 --count --grep=foo HEAD before and after your patch, though, and the
 difference was well within the noise. So maybe it's not worth caring
 about.

Oh, hold on, I'm incompetent. I measured the wrong build of git. Here
are the timings for git.git:

  [before]
  $ best-of-five git rev-list --count --grep=foo HEAD
  Attempt 1: 0.503
  Attempt 2: 0.5
  Attempt 3: 0.501
  Attempt 4: 0.502
  Attempt 5: 0.5

  real0m0.500s
  user0m0.488s
  sys 0m0.008s

  [after]
  $ best-of-five git rev-list --count --grep=foo HEAD
  Attempt 1: 0.514
  Attempt 2: 0.525
  Attempt 3: 0.517
  Attempt 4: 0.523
  Attempt 5: 0.518

  real0m0.514s
  user0m0.480s
  sys 0m0.028s

So not huge, but measurable.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html