Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2013-06-03 Thread Peter Oberndorfer
On 2012-11-07 22:13, Jeff King wrote:
 On Wed, Nov 07, 2012 at 10:10:59PM +0100, Peter Oberndorfer wrote:

 For me the key to reproduce the problem was to have 2 commits.
 Adding the file in the root commit it did not work. [1]
 You probably would need to pass --root for it to do the diff of the
 initial commit.

 The patch below fixes it, but it's terribly inefficient (it just detects
 the situation and reallocates). It would be much better to disable the
 reuse_worktree_file mmap when we populate the filespec, but it is too
 late to pass an option; we may have already populated from an earlier
 diffcore stage.
 Hi,
 I tested your patch, and i can confirm it fixes the problem for me.
 (also on my real world test in msysgit)
 Thanks for the report. I'd still like to pursue using a regex library
 that does not require NUL-termination, but I've been distracted by other
 things. I'm going to hold back my copy-to-a-NUL-buffer patch for now and
 see if I can get to the regex thing this week.

Hi,

are there any news regarding this problem?
The crash seems to still exist in the current version 1.8.3 and master.

Thanks,
Greetings Peter
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2013-06-03 Thread Jeff King
On Mon, Jun 03, 2013 at 07:25:06PM +0200, Peter Oberndorfer wrote:

  Thanks for the report. I'd still like to pursue using a regex library
  that does not require NUL-termination, but I've been distracted by other
  things. I'm going to hold back my copy-to-a-NUL-buffer patch for now and
  see if I can get to the regex thing this week.
 
 are there any news regarding this problem?
 The crash seems to still exist in the current version 1.8.3 and master.

Sorry, no, this got dropped due to lack of time. I _think_ it is as
simple as just tweaking the Makefile to unconditionally build against
the compat/ regex library, and then tweaking callsites as appropriate to
use the GNU-specific interface that takes buf/len instead of a
NUL-terminated string.

But there may be some hidden complexities to it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-11-07 Thread Jeff King
On Wed, Nov 07, 2012 at 10:10:59PM +0100, Peter Oberndorfer wrote:

  For me the key to reproduce the problem was to have 2 commits.
  Adding the file in the root commit it did not work. [1]
  You probably would need to pass --root for it to do the diff of the
  initial commit.
 
  The patch below fixes it, but it's terribly inefficient (it just detects
  the situation and reallocates). It would be much better to disable the
  reuse_worktree_file mmap when we populate the filespec, but it is too
  late to pass an option; we may have already populated from an earlier
  diffcore stage.
 Hi,
 I tested your patch, and i can confirm it fixes the problem for me.
 (also on my real world test in msysgit)

Thanks for the report. I'd still like to pursue using a regex library
that does not require NUL-termination, but I've been distracted by other
things. I'm going to hold back my copy-to-a-NUL-buffer patch for now and
see if I can get to the regex thing this week.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-11-01 Thread Ramsay Jones
Jeff King wrote:
 Thinking on it more, my patch, hacky thought it seems, may not be the
 worst solution. Here are the options that I see:
 
   1. Use a regex library that does not require NUL termination. If we
  are bound by the regular regexec interface, this is not feasible.
  But the GNU implementation works on arbitrary-length buffers (you
  just have to use a slightly different interface), and we already
  carry it in compat. It would mean platforms which provide a working
  but non-GNU regexec would have to start defining NO_REGEX.

I have thought about the possibility of doing this for unrelated reasons
in the past.

On cygwin, there have been two unexpected test passes since about v1.6.0
(I reported it to the list in passing), like so:

[ ... ]
All tests successful.

Test Summary Report
---
t0050-filesystem.sh  (Wstat: 0 Tests: 9 Failed: 
0)
  TODO passed:   5
t7008-grep-binary.sh (Wstat: 0 Tests: 20 
Failed: 0)
  TODO passed:   12
Files=604, Tests=8439, 11190 wallclock secs ( 2.59 usr  1.59 sys + 7294.86 
cusr
3416.65 csys = 10715.70 CPU)
Result: PASS

In particular, t7008.12 passes on cygwin because the regex library apparently
matches '.' to NUL. Indeed if you add a test_pause to the script and execute
grep .fi a (note grep *not* git-grep) then Binary file a matches on Linux,
cygwin and MinGW. (So I assume the test was added to document a difference in
behaviour to GNU grep).

So, if we use the GNU interface to the regex routines in compat, then we may
specify the grep syntax for use in git-grep. (Well that's the theory, I've
not actually tried to code it up, so take this with a pinch of salt! :-P ).

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-10-30 Thread Jeff King
On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:

 On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
 
  The patch below fixes it, but it's terribly inefficient (it just detects
  the situation and reallocates). It would be much better to disable the
  reuse_worktree_file mmap when we populate the filespec, but it is too
  late to pass an option; we may have already populated from an earlier
  diffcore stage.
  
  I guess if we teach the whole diff code that -G (and --pickaxe-regex)
  is brittle, we can disable the optimization from the beginning based on
  the diff options. I'll take a look.
 
 Hmm. That is problematic for two reasons.
 
   1. The whole diff call chain will have to be modified to pass the
  options around, so they can make it down to the
  diff_populate_filespec level. Alternatively, we could do some kind
  of global hack, which is ugly but would work OK in practice.
 
   2. Reusing a working tree file is only half of the reason a filespec
  might be mmap'd. It might also be because we are literally diffing
  the working tree. -G was meant to be used to limit log traversal,
  but it also works to reduce the diff output for something like git
  diff HEAD^.
 
 I really wish there were an alternate regexec interface we could use
 that took a pointer/size pair. Bleh.

Thinking on it more, my patch, hacky thought it seems, may not be the
worst solution. Here are the options that I see:

  1. Use a regex library that does not require NUL termination. If we
 are bound by the regular regexec interface, this is not feasible.
 But the GNU implementation works on arbitrary-length buffers (you
 just have to use a slightly different interface), and we already
 carry it in compat. It would mean platforms which provide a working
 but non-GNU regexec would have to start defining NO_REGEX.

  2. Figure out a way to get one extra zero byte via mmap. If the
 requested size does not fall on a page boundary, you get extra
 zero-ed bytes. Unfortunately, requesting an extra byte does not
 do what we want; you get SIGBUS accessing it.

  3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
 way we only incur the cost when we need it.

  4. Avoid mmap-ing in the first place when we are using -G or
 --pickaxe-regex (e.g., by doing a big read()). At first glance,
 this sounds more efficient than loading the data one way and then
 making another copy. But mmap+memcpy, aside from the momentary
 doubled memory requirement, is probably just as fast or faster than
 calling read() repeatedly.

I am really tempted by (1).

Given that (2) does not work, unless somebody comes up with something
clever there, that would make (3) the next best choice.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-10-30 Thread Junio C Hamano
(1) sounds attractive for more than one reason. In addition to avoidance of 
this issue, it would bring bug-to-bug compatibility across platforms.

(4), if we can run grep on streaming data (tweak interface we have for checking 
out a large blob to the working tree), would let us work on dataset larger than 
fit in core. Even though it would be much more work, it might turn out to be a 
better option in the longer run.

Jeff King p...@peff.net wrote:

On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:

 On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
 
  The patch below fixes it, but it's terribly inefficient (it just
detects
  the situation and reallocates). It would be much better to disable
the
  reuse_worktree_file mmap when we populate the filespec, but it is
too
  late to pass an option; we may have already populated from an
earlier
  diffcore stage.
  
  I guess if we teach the whole diff code that -G (and
--pickaxe-regex)
  is brittle, we can disable the optimization from the beginning
based on
  the diff options. I'll take a look.
 
 Hmm. That is problematic for two reasons.
 
   1. The whole diff call chain will have to be modified to pass the
  options around, so they can make it down to the
  diff_populate_filespec level. Alternatively, we could do some
kind
  of global hack, which is ugly but would work OK in practice.
 
   2. Reusing a working tree file is only half of the reason a
filespec
  might be mmap'd. It might also be because we are literally
diffing
  the working tree. -G was meant to be used to limit log
traversal,
  but it also works to reduce the diff output for something like
git
  diff HEAD^.
 
 I really wish there were an alternate regexec interface we could use
 that took a pointer/size pair. Bleh.

Thinking on it more, my patch, hacky thought it seems, may not be the
worst solution. Here are the options that I see:

  1. Use a regex library that does not require NUL termination. If we
 are bound by the regular regexec interface, this is not feasible.
 But the GNU implementation works on arbitrary-length buffers (you
 just have to use a slightly different interface), and we already
carry it in compat. It would mean platforms which provide a working
 but non-GNU regexec would have to start defining NO_REGEX.

  2. Figure out a way to get one extra zero byte via mmap. If the
 requested size does not fall on a page boundary, you get extra
 zero-ed bytes. Unfortunately, requesting an extra byte does not
 do what we want; you get SIGBUS accessing it.

 3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
 way we only incur the cost when we need it.

  4. Avoid mmap-ing in the first place when we are using -G or
 --pickaxe-regex (e.g., by doing a big read()). At first glance,
 this sounds more efficient than loading the data one way and then
 making another copy. But mmap+memcpy, aside from the momentary
doubled memory requirement, is probably just as fast or faster than
 calling read() repeatedly.

I am really tempted by (1).

Given that (2) does not work, unless somebody comes up with something
clever there, that would make (3) the next best choice.

-Peff

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-10-30 Thread Jeff King
On Tue, Oct 30, 2012 at 09:46:01PM +0900, Junio C Hamano wrote:

 (1) sounds attractive for more than one reason. In addition to
 avoidance of this issue, it would bring bug-to-bug compatibility
 across platforms.

Yeah. I mentioned breaking the build for people who would now need to
turn on NO_REGEX. But the only reason to do that is to let people on
glibc systems use the system version of the tools. A much saner approach
would be to just always build with our compat regex, and turn NO_REGEX
into a no-op. We already do the same thing for kwset.

 (4), if we can run grep on streaming data (tweak interface we have for
 checking out a large blob to the working tree), would let us work on
 dataset larger than fit in core. Even though it would be much more
 work, it might turn out to be a better option in the longer run.

Agreed, that would be nice. It's potentially a lot of work, but we could
probably get by with a special streaming version of diff_populate_filespec.

The tricky thing is that we have to run the regex matcher progressively
as we stream data in (since your match might fall in the middle of a
read boundary). Which is certainly going to require switching off of
regular regexec. I don't think glibc regex will handle it either,
though. It looks like pcre can report a partial match at the end of the
string, and you can either continue with the next chunk (if using
pcre_dfa) or append and re-start the pattern match (for regular
pcre_exec).

Which means we'd probably have to make streaming matches an optional
feature, and still do (1) first to fix the correctness issue.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-10-29 Thread Peter Oberndorfer
On 2012-10-29 07:05, Jeff King wrote:
 On Sun, Oct 28, 2012 at 08:56:39PM +0100, Peter Oberndorfer wrote:

 The patch below should fix it. I added tests, but please try your
 real-world test case on it to double-check.
 I tested your patch, but now it crashes for another reason :-)
 Well, that's progress, right? :)
Sure :-)

 i have a file with exactly 12288(0x3000) bytes in the repository.
 When the file is loaded, the data is placed luckily so the data end
 falls at a page boundary.
 Later diff_grep() calls regexec() which calls strlen() on the loaded buffer
 and ends up reading beyond the actual data into the next page
 which is not allocated and causes a pagefault.
 Or it could possibly (randomly) match the regex on data that is not
 actually part of a file...
 Yuck. For the most part, we treat blob content (and generally most
 object content) as a sized buffer. However, there are some spots which,
 either through laziness or because a code interface expects a string, we
 pass the value as a string. This works because the object-reading code
 puts an extra NUL at the end of our buffer to handle just such an
 instance. So we might prematurely end if the object contains embedded
 NULs, but we would never read past the end.

 The code to read the output of a textconv filter does not do this
 explicitly. I would think it would get it for free by virtue of reading
 into a strbuf, though. I'll try to investigate.
I could reproduce with my 0x3000 bytes file on linux. The buffer is not
read with a trailing null byte it is mapped by mmap in
diff_populate_filespec...
So i think we will not get away with expecting a trailing null :-/

For me the key to reproduce the problem was to have 2 commits.
Adding the file in the root commit it did not work. [1]

Greetings Peter
 -Peff
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


[1]
kumbayo@home:~/src$ mkdir git_mmap_crash2
kumbayo@home:~/src$ cd git_mmap_crash2
kumbayo@home:~/src/git_mmap_crash2$ git init
kumbayo@home:~/src/git_mmap_crash2$ echo blahblah
kumbayo@home:~/src/git_mmap_crash2$ git add blah
kumbayo@home:~/src/git_mmap_crash2$ git commit -m blah
[master (Basis-Version) 3458422] blah
diff_populate_filespec - xmmap for blah size:0x5 returned: 0xb7206000
 1 file changed, 1 insertion(+)
 create mode 100644 blah
kumbayo@home:~/src/git_mmap_crash2$ perl -e 'print - x 0x3000 ' asdf.txt
kumbayo@home:~/src/git_mmap_crash2$ git add asdf.txt
kumbayo@home:~/src/git_mmap_crash2$ git commit -m crashy
[master 5cf2c5f] crashy
diff_populate_filespec - xmmap for asdf.txt size:0x3000 returned:
0xb771e000
 1 file changed, 1 insertion(+)
 create mode 100644 asdf.txt

kumbayo@soybean:~/src/git_mmap_crash2$ valgrind git diff-tree -Ganything HEAD
==8388== Memcheck, a memory error detector
==8388== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==8388== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==8388== Command: git diff-tree -Ganything HEAD
==8388==
==8388== Conditional jump or move depends on uninitialised value(s)
==8388==at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)
==8388==by 0xA0: ???
==8388==
==8388== Conditional jump or move depends on uninitialised value(s)
==8388==at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)
==8388==by 0x7F: ???
==8388==


==8388== Conditional jump or move depends on uninitialised value(s)


==8388==at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)


==8388==by 0x30: ???


==8388==


==8388== Conditional jump or move depends on uninitialised value(s)


==8388==at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)


==8388==by 0x50: ???


==8388==


diffcore_pickaxe_grep


diff_populate_filespec - xmmap for asdf.txt size:0x3000 returned: 0x4035000


==8388== Invalid read of size 1


==8388==at 0x402C683: __GI_strlen (in
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)


==8388==by 0x430581F: regexec@@GLIBC_2.3.4 (regexec.c:245)


==8388==by 0x814489D: diff_grep (diffcore-pickaxe.c:110)
==8388==by 0x8144B89: pickaxe.constprop.6 (diffcore-pickaxe.c:40)
==8388==by 0x8144DCD: diffcore_pickaxe_grep (diffcore-pickaxe.c:155)
==8388==by 0x80DCE64: diffcore_std (diff.c:4638)
==8388==by 0x80F0B20: log_tree_diff_flush (log-tree.c:696)
==8388==  Address 0x4038000 is not stack'd, malloc'd or (recently) free'd
==8388==
==8388==
==8388== Process terminating with default action of signal 11 (SIGSEGV)
==8388==  Access not within mapped region at address 0x4038000
==8388==at 0x402C683: __GI_strlen (in 
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==8388==by 0x430581F: regexec@@GLIBC_2.3.4 (regexec.c:245)
==8388==by 0x814489D: diff_grep (diffcore-pickaxe.c:110)
==8388==by 0x8144B89: pickaxe.constprop.6 (diffcore-pickaxe.c:40)

Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-10-29 Thread Jeff King
On Mon, Oct 29, 2012 at 09:19:48PM +0100, Peter Oberndorfer wrote:

 I could reproduce with my 0x3000 bytes file on linux. The buffer is not
 read with a trailing null byte it is mapped by mmap in
 diff_populate_filespec...
 So i think we will not get away with expecting a trailing null :-/

Thanks for the reproduction recipe. I was testing with git log, which
does not use the mmap optimization.

 For me the key to reproduce the problem was to have 2 commits.
 Adding the file in the root commit it did not work. [1]

You probably would need to pass --root for it to do the diff of the
initial commit.

The patch below fixes it, but it's terribly inefficient (it just detects
the situation and reallocates). It would be much better to disable the
reuse_worktree_file mmap when we populate the filespec, but it is too
late to pass an option; we may have already populated from an earlier
diffcore stage.

I guess if we teach the whole diff code that -G (and --pickaxe-regex)
is brittle, we can disable the optimization from the beginning based on
the diff options. I'll take a look.

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index b097fa7..88d1a8f 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -80,6 +80,29 @@ static void fill_one(struct diff_filespec *one,
if (DIFF_FILE_VALID(one)) {
*textconv = get_textconv(one);
mf-size = fill_textconv(*textconv, one, mf-ptr);
+
+   /*
+* Horrible, horrible hack. If we are going to feed the result
+* to regexec, we must make sure it is NUL-terminated, but we
+* will not be if we have mmap'd a file and never munged it.
+*
+* We would do much better to turn off the reuse_worktree_file
+* optimization in the first place, which is the sole source of
+* these mmaps.
+*/
+   if (one-should_munmap  !*textconv) { mf-ptr =
+   xmallocz(one-size); memcpy(mf-ptr, one-data,
+   one-size);
+
+   /*
+* Attach the result to the filespec, which will
+* properly free it eventually.
+*/
+   munmap(one-data, one-size);
+   one-should_munmap = 0;
+   one-data = mf-ptr;
+   one-should_free = 1;
+   }
} else {
memset(mf, 0, sizeof(*mf));
}
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-10-29 Thread Jeff King
On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:

 The patch below fixes it, but it's terribly inefficient (it just detects
 the situation and reallocates). It would be much better to disable the
 reuse_worktree_file mmap when we populate the filespec, but it is too
 late to pass an option; we may have already populated from an earlier
 diffcore stage.
 
 I guess if we teach the whole diff code that -G (and --pickaxe-regex)
 is brittle, we can disable the optimization from the beginning based on
 the diff options. I'll take a look.

Hmm. That is problematic for two reasons.

  1. The whole diff call chain will have to be modified to pass the
 options around, so they can make it down to the
 diff_populate_filespec level. Alternatively, we could do some kind
 of global hack, which is ugly but would work OK in practice.

  2. Reusing a working tree file is only half of the reason a filespec
 might be mmap'd. It might also be because we are literally diffing
 the working tree. -G was meant to be used to limit log traversal,
 but it also works to reduce the diff output for something like git
 diff HEAD^.

I really wish there were an alternate regexec interface we could use
that took a pointer/size pair. Bleh.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-10-28 Thread Jeff King
On Sat, Oct 27, 2012 at 08:37:24PM +0200, Peter Oberndorfer wrote:

 It seems git diff-tree -Ganything tree crashes[1] with a null
 pointer dereference
 when run on a commit that adds a file (pdf) with a textconv filter.
 
 It can be reproduced with vanilla git by having a commit on top that
 adds a file with a textconv filter and executing git diff-tree
 -Ganything HEAD
 But running git log -Ganything still works without a crash.
 This problem seems to exist since the feature was first added in f506b8e8b5.

Thanks for a thorough bug report. I didn't reproduce the crash, but I
can see how it happens (it happens with diff-tree because we will reuse
the working tree file in that instance; it could also happen if you
turned on textconv caching).

 While testing I also noticed the -S and -G act on the original file
 instead of the textconv munged data.
 Is this intentional or caused by accessing the wrong data?

Both, perhaps. :)

-G operates on the munged data; you can see it feed the munged data to
xdiff in diff_grep. But the optimization for handling added and removed
files accidentally fed the wrong pointer. Fixing that is a no-brainer,
since the optimization is inconsistent with the regular code path.

-S, however, predates the invention of textconv and has never used it.
It is a little less clear that textconv is the right thing here, because
it is not about grepping the diff, but about counting occurrences of the
string inside the file content. I tend to think that doing so on the
textconv'd data would be what people generally want, but it is a
behavior change.

 Wild guess: should we really access p-one-data and not mf1.ptr?

Precisely. Thanks for your wild guess; it made finding the bug very
easy. :)

 Is there some more information i should provide?

The patch below should fix it. I added tests, but please try your
real-world test case on it to double-check.

-- 8 --
Subject: [PATCH] diff_grep: use textconv buffers for add/deleted files

If you use -G to grep a diff, we will apply a configured
textconv filter to the data before generating the diff.
However, if the diff is an addition or deletion, we do not
bother running the diff at all, and just look for the token
in the added (or removed) content. This works because we
know that the diff must contain every line of content.

However, while we used the textconv-derived buffers in the
regular diff, we accidentally passed the original unmodified
buffers to regexec when checking the added or removed
content. This could lead to an incorrect answer.

Worse, in some cases we might have a textconv buffer but no
original buffer (e.g., if we pulled the textconv data from
cache, or if we reused a working tree file when generating
it). In that case, we could actually feed NULL to regexec
and segfault.

Reported-by: Peter Oberndorfer kumbay...@arcor.de
Signed-off-by: Jeff King p...@peff.net
---
 diffcore-pickaxe.c   |  4 ++--
 t/t4030-diff-textconv.sh | 12 
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index ed23eb4..a209376 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -104,10 +104,10 @@ static int diff_grep(struct diff_filepair *p, struct 
diff_options *o,
if (!mf2.ptr)
return 0; /* ignore unmerged */
/* created two -- does it have what we are looking for? */
-   hit = !regexec(regexp, p-two-data, 1, regmatch, 0);
+   hit = !regexec(regexp, mf2.ptr, 1, regmatch, 0);
} else if (!mf2.ptr) {
/* removed one -- did it have what we are looking for? */
-   hit = !regexec(regexp, p-one-data, 1, regmatch, 0);
+   hit = !regexec(regexp, mf1.ptr, 1, regmatch, 0);
} else {
/*
 * We have both sides; need to run textual diff and see if
diff --git a/t/t4030-diff-textconv.sh b/t/t4030-diff-textconv.sh
index eebb1ee..461d27a 100755
--- a/t/t4030-diff-textconv.sh
+++ b/t/t4030-diff-textconv.sh
@@ -84,6 +84,18 @@ test_expect_success 'status -v produces text' '
git reset --soft HEAD@{1}
 '
 
+test_expect_success 'grep-diff (-G) operates on textconv data (add)' '
+   echo one expect 
+   git log --root --format=%s -G0 actual 
+   test_cmp expect actual
+'
+
+test_expect_success 'grep-diff (-G) operates on textconv data (modification)' '
+   echo two expect 
+   git log --root --format=%s -G1 actual 
+   test_cmp expect actual
+'
+
 cat expect.stat 'EOF'
  file | Bin 2 - 4 bytes
  1 file changed, 0 insertions(+), 0 deletions(-)
-- 
1.8.0.3.g3456896

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash on git diff-tree -Ganything tree for new files with textconv filter

2012-10-28 Thread Peter Oberndorfer
On 2012-10-28 13:01, Jeff King wrote:
 On Sat, Oct 27, 2012 at 08:37:24PM +0200, Peter Oberndorfer wrote:

 It seems git diff-tree -Ganything tree crashes[1] with a null
 pointer dereference
 when run on a commit that adds a file (pdf) with a textconv filter.

 It can be reproduced with vanilla git by having a commit on top that
 adds a file with a textconv filter and executing git diff-tree
 -Ganything HEAD
 But running git log -Ganything still works without a crash.
 This problem seems to exist since the feature was first added in f506b8e8b5.
 Thanks for a thorough bug report. I didn't reproduce the crash, but I
 can see how it happens (it happens with diff-tree because we will reuse
 the working tree file in that instance; it could also happen if you
 turned on textconv caching).

 While testing I also noticed the -S and -G act on the original file
 instead of the textconv munged data.
 Is this intentional or caused by accessing the wrong data?
 Both, perhaps. :)
Hi,
thanks for your patch for this!


 -G operates on the munged data; you can see it feed the munged data to
 xdiff in diff_grep. But the optimization for handling added and removed
 files accidentally fed the wrong pointer. Fixing that is a no-brainer,
 since the optimization is inconsistent with the regular code path.

 -S, however, predates the invention of textconv and has never used it.
 It is a little less clear that textconv is the right thing here, because
 it is not about grepping the diff, but about counting occurrences of the
 string inside the file content. I tend to think that doing so on the
 textconv'd data would be what people generally want, but it is a
 behavior change.

 Wild guess: should we really access p-one-data and not mf1.ptr?
 Precisely. Thanks for your wild guess; it made finding the bug very
 easy. :)

 Is there some more information i should provide?
 The patch below should fix it. I added tests, but please try your
 real-world test case on it to double-check.

I tested your patch, but now it crashes for another reason :-)
i have a file with exactly 12288(0x3000) bytes in the repository.
When the file is loaded, the data is placed luckily so the data end
falls at a page boundary.
Later diff_grep() calls regexec() which calls strlen() on the loaded buffer
and ends up reading beyond the actual data into the next page
which is not allocated and causes a pagefault.
Or it could possibly (randomly) match the regex on data that is not
actually part of a file...
Different memory allocation rules on Windows probably also have some
influence here.

My guess is that diff_filespec-data is not supposed to be zero terminated
and we should not invoke strlen() on a not zero terminated data.
But this should be decided by somebody who knows the rules.

Greetings Peter
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crash on git diff-tree -Ganything tree for new files with textconv filter

2012-10-27 Thread Peter Oberndorfer
Hi,

It seems git diff-tree -Ganything tree crashes[1] with a null
pointer dereference
when run on a commit that adds a file (pdf) with a textconv filter.

It can be reproduced with vanilla git by having a commit on top that
adds a file with a textconv filter and executing git diff-tree
-Ganything HEAD
But running git log -Ganything still works without a crash.
This problem seems to exist since the feature was first added in f506b8e8b5.

While testing I also noticed the -S and -G act on the original file
instead of the textconv munged data.
Is this intentional or caused by accessing the wrong data?
Wild guess: should we really access p-one-data and not mf1.ptr?

Is there some more information i should provide?

Greetings Peter

[1]
I am running msysgit v1.8.0.msysgit.0 (52d3a7583a)
and i tried added -G pickaxe support to gitk.
gitk runs git diff-tree -r -s --stdin -Gpattern

This is how i detected the crash the first time.
(but only because of a crash popup on Windows, gitk does not complain...)

For testing on vanilla git I used .git/config:
[diff upcase2]
textconv = tr a-z A-Z 

.gitatrributes:
newtext diff=upcase2

Program received signal SIGSEGV, Segmentation fault.
[Switching to thread 3864.0x83c]
0x0049d90c in regexec (preg=0x22f900, string=0x0, nmatch=1,
pmatch=0x22f46c, eflags=0) at compat/regex/regexec.c:241
241   length = strlen (string);
(gdb) bt
#0  0x0049d90c in regexec (preg=0x22f900, string=0x0, nmatch=1,
pmatch=0x22f46c, eflags=0) at compat/regex/regexec.c:241
#1  0x004f5763 in diff_grep (p=0x109a530, o=0x550b48, regexp=0x22f900,
kws=0x0) at diffcore-pickaxe.c:110
#2  0x004f59dc in pickaxe (o=value optimized out, regexp=0x22f900,
kws=0x, fn=0x4f5620 diff_grep) at diffcore-pickaxe.c:40
#3  0x004f5bd4 in diffcore_pickaxe_grep (o=0x550b48) at
diffcore-pickaxe.c:154
#4  0x0048049a in diffcore_std (options=0x550b48) at diff.c:4630
#5  0x004dc16a in log_tree_diff_flush (opt=0x5508c0) at log-tree.c:697
#6  0x004dc32e in log_tree_commit (opt=0x5508c0, commit=0xffc620) at
log-tree.c:790
#7  0x004206dd in cmd_diff_tree (argc=value optimized out,
argv=0x3d24bc, prefix=0x0) at builtin/diff-tree.c:43
#8  0x00401a16 in handle_internal_command (argc=value optimized out,
argv=value optimized out) at git.c:306
#9  0x00401c00 in main (argc=6, argv=0x3d24b8) at git.c:513
(gdb) up
#1  0x004f5763 in diff_grep (p=0x109a530, o=0x550b48, regexp=0x22f900,
kws=0x0) at diffcore-pickaxe.c:110
110 hit = !regexec(regexp, p-one-data, 1,
regmatch, 0);
(gdb) info locals
regmatch = {rm_so = 17408640, rm_eo = 2291968}
textconv_one = (struct userdiff_driver *) 0x0
textconv_two = (struct userdiff_driver *) 0xe10468
mf1 = {ptr = 0x0, size = 0}
mf2 = {
  ptr = 0x2bbcaf0 ' ' repeats 52 times, PROJECT
DESCRIPTION\r\n\r\n\r\nProject Number:, ' ' repeats 19 times,
x\r\nProject Description:, ' ' repeats 14 times,
\r\nx:, ' ' repeats 11 times..., size = 64185}
hit = value optimized out
(gdb) print p
$1 = (struct diff_filepair *) 0x109a530
(gdb) print *p
$2 = {one = 0x109a320, two = 0x109a428, score = 0, status = 0 '\0',
broken_pair = 0, renamed_pair = 0, is_unmerged = 0}
(gdb) print p-one
$3 = (struct diff_filespec *) 0x109a320
(gdb) print *p-one
$4 = {sha1 = '\0' repeats 19 times, path = 0x109a360
/xxx/doc/ x 1.4.pdf, data = 0x0, cnt_data = 0x0,
  funcname_pattern_ident = 0x0, size = 0, count = 1, xfrm_flags = 0,
rename_used = 0, mode = 0, sha1_valid = 0, should_free = 0,
should_munmap = 0,
  dirty_submodule = 0, is_stdin = 0, has_more_entries = 0, driver = 0x0,
is_binary = -1}
(gdb) print *p-two
$5 = {sha1 = 0aax\217\231)oBaAa(\021\234^'Q\236\230, path = 0x109a468
/xxx/doc/ x 1.4.pdf, data = 0x0,
  cnt_data = 0x0, funcname_pattern_ident = 0x0, size = 0, count = 1,
xfrm_flags = 0, rename_used = 0, mode = 33188, sha1_valid = 1,
should_free = 0,
  should_munmap = 0, dirty_submodule = 0, is_stdin = 0, has_more_entries
= 0, driver = 0xe10468, is_binary = -1}
(gdb) print textconv_one
$6 = (struct userdiff_driver *) 0x0
(gdb) print textconv_two
$7 = (struct userdiff_driver *) 0xe10468
(gdb) print *textconv_two
$10 = {name = 0xfe4fc0 astextplain, external = 0x0, binary = -1,
funcname = {pattern = 0x0, cflags = 0}, word_regex = 0x0,
  textconv = 0xfe4fd8 astextplain, textconv_cache = 0x0,
textconv_want_cache = 0}
(gdb)

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html