Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On 2012-11-07 22:13, Jeff King wrote: On Wed, Nov 07, 2012 at 10:10:59PM +0100, Peter Oberndorfer wrote: For me the key to reproduce the problem was to have 2 commits. Adding the file in the root commit it did not work. [1] You probably would need to pass --root for it to do the diff of the initial commit. The patch below fixes it, but it's terribly inefficient (it just detects the situation and reallocates). It would be much better to disable the reuse_worktree_file mmap when we populate the filespec, but it is too late to pass an option; we may have already populated from an earlier diffcore stage. Hi, I tested your patch, and i can confirm it fixes the problem for me. (also on my real world test in msysgit) Thanks for the report. I'd still like to pursue using a regex library that does not require NUL-termination, but I've been distracted by other things. I'm going to hold back my copy-to-a-NUL-buffer patch for now and see if I can get to the regex thing this week. Hi, are there any news regarding this problem? The crash seems to still exist in the current version 1.8.3 and master. Thanks, Greetings Peter -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On Mon, Jun 03, 2013 at 07:25:06PM +0200, Peter Oberndorfer wrote: Thanks for the report. I'd still like to pursue using a regex library that does not require NUL-termination, but I've been distracted by other things. I'm going to hold back my copy-to-a-NUL-buffer patch for now and see if I can get to the regex thing this week. are there any news regarding this problem? The crash seems to still exist in the current version 1.8.3 and master. Sorry, no, this got dropped due to lack of time. I _think_ it is as simple as just tweaking the Makefile to unconditionally build against the compat/ regex library, and then tweaking callsites as appropriate to use the GNU-specific interface that takes buf/len instead of a NUL-terminated string. But there may be some hidden complexities to it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On Wed, Nov 07, 2012 at 10:10:59PM +0100, Peter Oberndorfer wrote: For me the key to reproduce the problem was to have 2 commits. Adding the file in the root commit it did not work. [1] You probably would need to pass --root for it to do the diff of the initial commit. The patch below fixes it, but it's terribly inefficient (it just detects the situation and reallocates). It would be much better to disable the reuse_worktree_file mmap when we populate the filespec, but it is too late to pass an option; we may have already populated from an earlier diffcore stage. Hi, I tested your patch, and i can confirm it fixes the problem for me. (also on my real world test in msysgit) Thanks for the report. I'd still like to pursue using a regex library that does not require NUL-termination, but I've been distracted by other things. I'm going to hold back my copy-to-a-NUL-buffer patch for now and see if I can get to the regex thing this week. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
Jeff King wrote: Thinking on it more, my patch, hacky thought it seems, may not be the worst solution. Here are the options that I see: 1. Use a regex library that does not require NUL termination. If we are bound by the regular regexec interface, this is not feasible. But the GNU implementation works on arbitrary-length buffers (you just have to use a slightly different interface), and we already carry it in compat. It would mean platforms which provide a working but non-GNU regexec would have to start defining NO_REGEX. I have thought about the possibility of doing this for unrelated reasons in the past. On cygwin, there have been two unexpected test passes since about v1.6.0 (I reported it to the list in passing), like so: [ ... ] All tests successful. Test Summary Report --- t0050-filesystem.sh (Wstat: 0 Tests: 9 Failed: 0) TODO passed: 5 t7008-grep-binary.sh (Wstat: 0 Tests: 20 Failed: 0) TODO passed: 12 Files=604, Tests=8439, 11190 wallclock secs ( 2.59 usr 1.59 sys + 7294.86 cusr 3416.65 csys = 10715.70 CPU) Result: PASS In particular, t7008.12 passes on cygwin because the regex library apparently matches '.' to NUL. Indeed if you add a test_pause to the script and execute grep .fi a (note grep *not* git-grep) then Binary file a matches on Linux, cygwin and MinGW. (So I assume the test was added to document a difference in behaviour to GNU grep). So, if we use the GNU interface to the regex routines in compat, then we may specify the grep syntax for use in git-grep. (Well that's the theory, I've not actually tried to code it up, so take this with a pinch of salt! :-P ). ATB, Ramsay Jones -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote: On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote: The patch below fixes it, but it's terribly inefficient (it just detects the situation and reallocates). It would be much better to disable the reuse_worktree_file mmap when we populate the filespec, but it is too late to pass an option; we may have already populated from an earlier diffcore stage. I guess if we teach the whole diff code that -G (and --pickaxe-regex) is brittle, we can disable the optimization from the beginning based on the diff options. I'll take a look. Hmm. That is problematic for two reasons. 1. The whole diff call chain will have to be modified to pass the options around, so they can make it down to the diff_populate_filespec level. Alternatively, we could do some kind of global hack, which is ugly but would work OK in practice. 2. Reusing a working tree file is only half of the reason a filespec might be mmap'd. It might also be because we are literally diffing the working tree. -G was meant to be used to limit log traversal, but it also works to reduce the diff output for something like git diff HEAD^. I really wish there were an alternate regexec interface we could use that took a pointer/size pair. Bleh. Thinking on it more, my patch, hacky thought it seems, may not be the worst solution. Here are the options that I see: 1. Use a regex library that does not require NUL termination. If we are bound by the regular regexec interface, this is not feasible. But the GNU implementation works on arbitrary-length buffers (you just have to use a slightly different interface), and we already carry it in compat. It would mean platforms which provide a working but non-GNU regexec would have to start defining NO_REGEX. 2. Figure out a way to get one extra zero byte via mmap. If the requested size does not fall on a page boundary, you get extra zero-ed bytes. Unfortunately, requesting an extra byte does not do what we want; you get SIGBUS accessing it. 3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That way we only incur the cost when we need it. 4. Avoid mmap-ing in the first place when we are using -G or --pickaxe-regex (e.g., by doing a big read()). At first glance, this sounds more efficient than loading the data one way and then making another copy. But mmap+memcpy, aside from the momentary doubled memory requirement, is probably just as fast or faster than calling read() repeatedly. I am really tempted by (1). Given that (2) does not work, unless somebody comes up with something clever there, that would make (3) the next best choice. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
(1) sounds attractive for more than one reason. In addition to avoidance of this issue, it would bring bug-to-bug compatibility across platforms. (4), if we can run grep on streaming data (tweak interface we have for checking out a large blob to the working tree), would let us work on dataset larger than fit in core. Even though it would be much more work, it might turn out to be a better option in the longer run. Jeff King p...@peff.net wrote: On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote: On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote: The patch below fixes it, but it's terribly inefficient (it just detects the situation and reallocates). It would be much better to disable the reuse_worktree_file mmap when we populate the filespec, but it is too late to pass an option; we may have already populated from an earlier diffcore stage. I guess if we teach the whole diff code that -G (and --pickaxe-regex) is brittle, we can disable the optimization from the beginning based on the diff options. I'll take a look. Hmm. That is problematic for two reasons. 1. The whole diff call chain will have to be modified to pass the options around, so they can make it down to the diff_populate_filespec level. Alternatively, we could do some kind of global hack, which is ugly but would work OK in practice. 2. Reusing a working tree file is only half of the reason a filespec might be mmap'd. It might also be because we are literally diffing the working tree. -G was meant to be used to limit log traversal, but it also works to reduce the diff output for something like git diff HEAD^. I really wish there were an alternate regexec interface we could use that took a pointer/size pair. Bleh. Thinking on it more, my patch, hacky thought it seems, may not be the worst solution. Here are the options that I see: 1. Use a regex library that does not require NUL termination. If we are bound by the regular regexec interface, this is not feasible. But the GNU implementation works on arbitrary-length buffers (you just have to use a slightly different interface), and we already carry it in compat. It would mean platforms which provide a working but non-GNU regexec would have to start defining NO_REGEX. 2. Figure out a way to get one extra zero byte via mmap. If the requested size does not fall on a page boundary, you get extra zero-ed bytes. Unfortunately, requesting an extra byte does not do what we want; you get SIGBUS accessing it. 3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That way we only incur the cost when we need it. 4. Avoid mmap-ing in the first place when we are using -G or --pickaxe-regex (e.g., by doing a big read()). At first glance, this sounds more efficient than loading the data one way and then making another copy. But mmap+memcpy, aside from the momentary doubled memory requirement, is probably just as fast or faster than calling read() repeatedly. I am really tempted by (1). Given that (2) does not work, unless somebody comes up with something clever there, that would make (3) the next best choice. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On Tue, Oct 30, 2012 at 09:46:01PM +0900, Junio C Hamano wrote: (1) sounds attractive for more than one reason. In addition to avoidance of this issue, it would bring bug-to-bug compatibility across platforms. Yeah. I mentioned breaking the build for people who would now need to turn on NO_REGEX. But the only reason to do that is to let people on glibc systems use the system version of the tools. A much saner approach would be to just always build with our compat regex, and turn NO_REGEX into a no-op. We already do the same thing for kwset. (4), if we can run grep on streaming data (tweak interface we have for checking out a large blob to the working tree), would let us work on dataset larger than fit in core. Even though it would be much more work, it might turn out to be a better option in the longer run. Agreed, that would be nice. It's potentially a lot of work, but we could probably get by with a special streaming version of diff_populate_filespec. The tricky thing is that we have to run the regex matcher progressively as we stream data in (since your match might fall in the middle of a read boundary). Which is certainly going to require switching off of regular regexec. I don't think glibc regex will handle it either, though. It looks like pcre can report a partial match at the end of the string, and you can either continue with the next chunk (if using pcre_dfa) or append and re-start the pattern match (for regular pcre_exec). Which means we'd probably have to make streaming matches an optional feature, and still do (1) first to fix the correctness issue. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On 2012-10-29 07:05, Jeff King wrote: On Sun, Oct 28, 2012 at 08:56:39PM +0100, Peter Oberndorfer wrote: The patch below should fix it. I added tests, but please try your real-world test case on it to double-check. I tested your patch, but now it crashes for another reason :-) Well, that's progress, right? :) Sure :-) i have a file with exactly 12288(0x3000) bytes in the repository. When the file is loaded, the data is placed luckily so the data end falls at a page boundary. Later diff_grep() calls regexec() which calls strlen() on the loaded buffer and ends up reading beyond the actual data into the next page which is not allocated and causes a pagefault. Or it could possibly (randomly) match the regex on data that is not actually part of a file... Yuck. For the most part, we treat blob content (and generally most object content) as a sized buffer. However, there are some spots which, either through laziness or because a code interface expects a string, we pass the value as a string. This works because the object-reading code puts an extra NUL at the end of our buffer to handle just such an instance. So we might prematurely end if the object contains embedded NULs, but we would never read past the end. The code to read the output of a textconv filter does not do this explicitly. I would think it would get it for free by virtue of reading into a strbuf, though. I'll try to investigate. I could reproduce with my 0x3000 bytes file on linux. The buffer is not read with a trailing null byte it is mapped by mmap in diff_populate_filespec... So i think we will not get away with expecting a trailing null :-/ For me the key to reproduce the problem was to have 2 commits. Adding the file in the root commit it did not work. [1] Greetings Peter -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html [1] kumbayo@home:~/src$ mkdir git_mmap_crash2 kumbayo@home:~/src$ cd git_mmap_crash2 kumbayo@home:~/src/git_mmap_crash2$ git init kumbayo@home:~/src/git_mmap_crash2$ echo blahblah kumbayo@home:~/src/git_mmap_crash2$ git add blah kumbayo@home:~/src/git_mmap_crash2$ git commit -m blah [master (Basis-Version) 3458422] blah diff_populate_filespec - xmmap for blah size:0x5 returned: 0xb7206000 1 file changed, 1 insertion(+) create mode 100644 blah kumbayo@home:~/src/git_mmap_crash2$ perl -e 'print - x 0x3000 ' asdf.txt kumbayo@home:~/src/git_mmap_crash2$ git add asdf.txt kumbayo@home:~/src/git_mmap_crash2$ git commit -m crashy [master 5cf2c5f] crashy diff_populate_filespec - xmmap for asdf.txt size:0x3000 returned: 0xb771e000 1 file changed, 1 insertion(+) create mode 100644 asdf.txt kumbayo@soybean:~/src/git_mmap_crash2$ valgrind git diff-tree -Ganything HEAD ==8388== Memcheck, a memory error detector ==8388== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==8388== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==8388== Command: git diff-tree -Ganything HEAD ==8388== ==8388== Conditional jump or move depends on uninitialised value(s) ==8388==at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4) ==8388==by 0xA0: ??? ==8388== ==8388== Conditional jump or move depends on uninitialised value(s) ==8388==at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4) ==8388==by 0x7F: ??? ==8388== ==8388== Conditional jump or move depends on uninitialised value(s) ==8388==at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4) ==8388==by 0x30: ??? ==8388== ==8388== Conditional jump or move depends on uninitialised value(s) ==8388==at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4) ==8388==by 0x50: ??? ==8388== diffcore_pickaxe_grep diff_populate_filespec - xmmap for asdf.txt size:0x3000 returned: 0x4035000 ==8388== Invalid read of size 1 ==8388==at 0x402C683: __GI_strlen (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==8388==by 0x430581F: regexec@@GLIBC_2.3.4 (regexec.c:245) ==8388==by 0x814489D: diff_grep (diffcore-pickaxe.c:110) ==8388==by 0x8144B89: pickaxe.constprop.6 (diffcore-pickaxe.c:40) ==8388==by 0x8144DCD: diffcore_pickaxe_grep (diffcore-pickaxe.c:155) ==8388==by 0x80DCE64: diffcore_std (diff.c:4638) ==8388==by 0x80F0B20: log_tree_diff_flush (log-tree.c:696) ==8388== Address 0x4038000 is not stack'd, malloc'd or (recently) free'd ==8388== ==8388== ==8388== Process terminating with default action of signal 11 (SIGSEGV) ==8388== Access not within mapped region at address 0x4038000 ==8388==at 0x402C683: __GI_strlen (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==8388==by 0x430581F: regexec@@GLIBC_2.3.4 (regexec.c:245) ==8388==by 0x814489D: diff_grep (diffcore-pickaxe.c:110) ==8388==by 0x8144B89: pickaxe.constprop.6 (diffcore-pickaxe.c:40)
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On Mon, Oct 29, 2012 at 09:19:48PM +0100, Peter Oberndorfer wrote: I could reproduce with my 0x3000 bytes file on linux. The buffer is not read with a trailing null byte it is mapped by mmap in diff_populate_filespec... So i think we will not get away with expecting a trailing null :-/ Thanks for the reproduction recipe. I was testing with git log, which does not use the mmap optimization. For me the key to reproduce the problem was to have 2 commits. Adding the file in the root commit it did not work. [1] You probably would need to pass --root for it to do the diff of the initial commit. The patch below fixes it, but it's terribly inefficient (it just detects the situation and reallocates). It would be much better to disable the reuse_worktree_file mmap when we populate the filespec, but it is too late to pass an option; we may have already populated from an earlier diffcore stage. I guess if we teach the whole diff code that -G (and --pickaxe-regex) is brittle, we can disable the optimization from the beginning based on the diff options. I'll take a look. diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c index b097fa7..88d1a8f 100644 --- a/diffcore-pickaxe.c +++ b/diffcore-pickaxe.c @@ -80,6 +80,29 @@ static void fill_one(struct diff_filespec *one, if (DIFF_FILE_VALID(one)) { *textconv = get_textconv(one); mf-size = fill_textconv(*textconv, one, mf-ptr); + + /* +* Horrible, horrible hack. If we are going to feed the result +* to regexec, we must make sure it is NUL-terminated, but we +* will not be if we have mmap'd a file and never munged it. +* +* We would do much better to turn off the reuse_worktree_file +* optimization in the first place, which is the sole source of +* these mmaps. +*/ + if (one-should_munmap !*textconv) { mf-ptr = + xmallocz(one-size); memcpy(mf-ptr, one-data, + one-size); + + /* +* Attach the result to the filespec, which will +* properly free it eventually. +*/ + munmap(one-data, one-size); + one-should_munmap = 0; + one-data = mf-ptr; + one-should_free = 1; + } } else { memset(mf, 0, sizeof(*mf)); } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote: The patch below fixes it, but it's terribly inefficient (it just detects the situation and reallocates). It would be much better to disable the reuse_worktree_file mmap when we populate the filespec, but it is too late to pass an option; we may have already populated from an earlier diffcore stage. I guess if we teach the whole diff code that -G (and --pickaxe-regex) is brittle, we can disable the optimization from the beginning based on the diff options. I'll take a look. Hmm. That is problematic for two reasons. 1. The whole diff call chain will have to be modified to pass the options around, so they can make it down to the diff_populate_filespec level. Alternatively, we could do some kind of global hack, which is ugly but would work OK in practice. 2. Reusing a working tree file is only half of the reason a filespec might be mmap'd. It might also be because we are literally diffing the working tree. -G was meant to be used to limit log traversal, but it also works to reduce the diff output for something like git diff HEAD^. I really wish there were an alternate regexec interface we could use that took a pointer/size pair. Bleh. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On Sat, Oct 27, 2012 at 08:37:24PM +0200, Peter Oberndorfer wrote: It seems git diff-tree -Ganything tree crashes[1] with a null pointer dereference when run on a commit that adds a file (pdf) with a textconv filter. It can be reproduced with vanilla git by having a commit on top that adds a file with a textconv filter and executing git diff-tree -Ganything HEAD But running git log -Ganything still works without a crash. This problem seems to exist since the feature was first added in f506b8e8b5. Thanks for a thorough bug report. I didn't reproduce the crash, but I can see how it happens (it happens with diff-tree because we will reuse the working tree file in that instance; it could also happen if you turned on textconv caching). While testing I also noticed the -S and -G act on the original file instead of the textconv munged data. Is this intentional or caused by accessing the wrong data? Both, perhaps. :) -G operates on the munged data; you can see it feed the munged data to xdiff in diff_grep. But the optimization for handling added and removed files accidentally fed the wrong pointer. Fixing that is a no-brainer, since the optimization is inconsistent with the regular code path. -S, however, predates the invention of textconv and has never used it. It is a little less clear that textconv is the right thing here, because it is not about grepping the diff, but about counting occurrences of the string inside the file content. I tend to think that doing so on the textconv'd data would be what people generally want, but it is a behavior change. Wild guess: should we really access p-one-data and not mf1.ptr? Precisely. Thanks for your wild guess; it made finding the bug very easy. :) Is there some more information i should provide? The patch below should fix it. I added tests, but please try your real-world test case on it to double-check. -- 8 -- Subject: [PATCH] diff_grep: use textconv buffers for add/deleted files If you use -G to grep a diff, we will apply a configured textconv filter to the data before generating the diff. However, if the diff is an addition or deletion, we do not bother running the diff at all, and just look for the token in the added (or removed) content. This works because we know that the diff must contain every line of content. However, while we used the textconv-derived buffers in the regular diff, we accidentally passed the original unmodified buffers to regexec when checking the added or removed content. This could lead to an incorrect answer. Worse, in some cases we might have a textconv buffer but no original buffer (e.g., if we pulled the textconv data from cache, or if we reused a working tree file when generating it). In that case, we could actually feed NULL to regexec and segfault. Reported-by: Peter Oberndorfer kumbay...@arcor.de Signed-off-by: Jeff King p...@peff.net --- diffcore-pickaxe.c | 4 ++-- t/t4030-diff-textconv.sh | 12 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c index ed23eb4..a209376 100644 --- a/diffcore-pickaxe.c +++ b/diffcore-pickaxe.c @@ -104,10 +104,10 @@ static int diff_grep(struct diff_filepair *p, struct diff_options *o, if (!mf2.ptr) return 0; /* ignore unmerged */ /* created two -- does it have what we are looking for? */ - hit = !regexec(regexp, p-two-data, 1, regmatch, 0); + hit = !regexec(regexp, mf2.ptr, 1, regmatch, 0); } else if (!mf2.ptr) { /* removed one -- did it have what we are looking for? */ - hit = !regexec(regexp, p-one-data, 1, regmatch, 0); + hit = !regexec(regexp, mf1.ptr, 1, regmatch, 0); } else { /* * We have both sides; need to run textual diff and see if diff --git a/t/t4030-diff-textconv.sh b/t/t4030-diff-textconv.sh index eebb1ee..461d27a 100755 --- a/t/t4030-diff-textconv.sh +++ b/t/t4030-diff-textconv.sh @@ -84,6 +84,18 @@ test_expect_success 'status -v produces text' ' git reset --soft HEAD@{1} ' +test_expect_success 'grep-diff (-G) operates on textconv data (add)' ' + echo one expect + git log --root --format=%s -G0 actual + test_cmp expect actual +' + +test_expect_success 'grep-diff (-G) operates on textconv data (modification)' ' + echo two expect + git log --root --format=%s -G1 actual + test_cmp expect actual +' + cat expect.stat 'EOF' file | Bin 2 - 4 bytes 1 file changed, 0 insertions(+), 0 deletions(-) -- 1.8.0.3.g3456896 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash on git diff-tree -Ganything tree for new files with textconv filter
On 2012-10-28 13:01, Jeff King wrote: On Sat, Oct 27, 2012 at 08:37:24PM +0200, Peter Oberndorfer wrote: It seems git diff-tree -Ganything tree crashes[1] with a null pointer dereference when run on a commit that adds a file (pdf) with a textconv filter. It can be reproduced with vanilla git by having a commit on top that adds a file with a textconv filter and executing git diff-tree -Ganything HEAD But running git log -Ganything still works without a crash. This problem seems to exist since the feature was first added in f506b8e8b5. Thanks for a thorough bug report. I didn't reproduce the crash, but I can see how it happens (it happens with diff-tree because we will reuse the working tree file in that instance; it could also happen if you turned on textconv caching). While testing I also noticed the -S and -G act on the original file instead of the textconv munged data. Is this intentional or caused by accessing the wrong data? Both, perhaps. :) Hi, thanks for your patch for this! -G operates on the munged data; you can see it feed the munged data to xdiff in diff_grep. But the optimization for handling added and removed files accidentally fed the wrong pointer. Fixing that is a no-brainer, since the optimization is inconsistent with the regular code path. -S, however, predates the invention of textconv and has never used it. It is a little less clear that textconv is the right thing here, because it is not about grepping the diff, but about counting occurrences of the string inside the file content. I tend to think that doing so on the textconv'd data would be what people generally want, but it is a behavior change. Wild guess: should we really access p-one-data and not mf1.ptr? Precisely. Thanks for your wild guess; it made finding the bug very easy. :) Is there some more information i should provide? The patch below should fix it. I added tests, but please try your real-world test case on it to double-check. I tested your patch, but now it crashes for another reason :-) i have a file with exactly 12288(0x3000) bytes in the repository. When the file is loaded, the data is placed luckily so the data end falls at a page boundary. Later diff_grep() calls regexec() which calls strlen() on the loaded buffer and ends up reading beyond the actual data into the next page which is not allocated and causes a pagefault. Or it could possibly (randomly) match the regex on data that is not actually part of a file... Different memory allocation rules on Windows probably also have some influence here. My guess is that diff_filespec-data is not supposed to be zero terminated and we should not invoke strlen() on a not zero terminated data. But this should be decided by somebody who knows the rules. Greetings Peter -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
crash on git diff-tree -Ganything tree for new files with textconv filter
Hi, It seems git diff-tree -Ganything tree crashes[1] with a null pointer dereference when run on a commit that adds a file (pdf) with a textconv filter. It can be reproduced with vanilla git by having a commit on top that adds a file with a textconv filter and executing git diff-tree -Ganything HEAD But running git log -Ganything still works without a crash. This problem seems to exist since the feature was first added in f506b8e8b5. While testing I also noticed the -S and -G act on the original file instead of the textconv munged data. Is this intentional or caused by accessing the wrong data? Wild guess: should we really access p-one-data and not mf1.ptr? Is there some more information i should provide? Greetings Peter [1] I am running msysgit v1.8.0.msysgit.0 (52d3a7583a) and i tried added -G pickaxe support to gitk. gitk runs git diff-tree -r -s --stdin -Gpattern This is how i detected the crash the first time. (but only because of a crash popup on Windows, gitk does not complain...) For testing on vanilla git I used .git/config: [diff upcase2] textconv = tr a-z A-Z .gitatrributes: newtext diff=upcase2 Program received signal SIGSEGV, Segmentation fault. [Switching to thread 3864.0x83c] 0x0049d90c in regexec (preg=0x22f900, string=0x0, nmatch=1, pmatch=0x22f46c, eflags=0) at compat/regex/regexec.c:241 241 length = strlen (string); (gdb) bt #0 0x0049d90c in regexec (preg=0x22f900, string=0x0, nmatch=1, pmatch=0x22f46c, eflags=0) at compat/regex/regexec.c:241 #1 0x004f5763 in diff_grep (p=0x109a530, o=0x550b48, regexp=0x22f900, kws=0x0) at diffcore-pickaxe.c:110 #2 0x004f59dc in pickaxe (o=value optimized out, regexp=0x22f900, kws=0x, fn=0x4f5620 diff_grep) at diffcore-pickaxe.c:40 #3 0x004f5bd4 in diffcore_pickaxe_grep (o=0x550b48) at diffcore-pickaxe.c:154 #4 0x0048049a in diffcore_std (options=0x550b48) at diff.c:4630 #5 0x004dc16a in log_tree_diff_flush (opt=0x5508c0) at log-tree.c:697 #6 0x004dc32e in log_tree_commit (opt=0x5508c0, commit=0xffc620) at log-tree.c:790 #7 0x004206dd in cmd_diff_tree (argc=value optimized out, argv=0x3d24bc, prefix=0x0) at builtin/diff-tree.c:43 #8 0x00401a16 in handle_internal_command (argc=value optimized out, argv=value optimized out) at git.c:306 #9 0x00401c00 in main (argc=6, argv=0x3d24b8) at git.c:513 (gdb) up #1 0x004f5763 in diff_grep (p=0x109a530, o=0x550b48, regexp=0x22f900, kws=0x0) at diffcore-pickaxe.c:110 110 hit = !regexec(regexp, p-one-data, 1, regmatch, 0); (gdb) info locals regmatch = {rm_so = 17408640, rm_eo = 2291968} textconv_one = (struct userdiff_driver *) 0x0 textconv_two = (struct userdiff_driver *) 0xe10468 mf1 = {ptr = 0x0, size = 0} mf2 = { ptr = 0x2bbcaf0 ' ' repeats 52 times, PROJECT DESCRIPTION\r\n\r\n\r\nProject Number:, ' ' repeats 19 times, x\r\nProject Description:, ' ' repeats 14 times, \r\nx:, ' ' repeats 11 times..., size = 64185} hit = value optimized out (gdb) print p $1 = (struct diff_filepair *) 0x109a530 (gdb) print *p $2 = {one = 0x109a320, two = 0x109a428, score = 0, status = 0 '\0', broken_pair = 0, renamed_pair = 0, is_unmerged = 0} (gdb) print p-one $3 = (struct diff_filespec *) 0x109a320 (gdb) print *p-one $4 = {sha1 = '\0' repeats 19 times, path = 0x109a360 /xxx/doc/ x 1.4.pdf, data = 0x0, cnt_data = 0x0, funcname_pattern_ident = 0x0, size = 0, count = 1, xfrm_flags = 0, rename_used = 0, mode = 0, sha1_valid = 0, should_free = 0, should_munmap = 0, dirty_submodule = 0, is_stdin = 0, has_more_entries = 0, driver = 0x0, is_binary = -1} (gdb) print *p-two $5 = {sha1 = 0aax\217\231)oBaAa(\021\234^'Q\236\230, path = 0x109a468 /xxx/doc/ x 1.4.pdf, data = 0x0, cnt_data = 0x0, funcname_pattern_ident = 0x0, size = 0, count = 1, xfrm_flags = 0, rename_used = 0, mode = 33188, sha1_valid = 1, should_free = 0, should_munmap = 0, dirty_submodule = 0, is_stdin = 0, has_more_entries = 0, driver = 0xe10468, is_binary = -1} (gdb) print textconv_one $6 = (struct userdiff_driver *) 0x0 (gdb) print textconv_two $7 = (struct userdiff_driver *) 0xe10468 (gdb) print *textconv_two $10 = {name = 0xfe4fc0 astextplain, external = 0x0, binary = -1, funcname = {pattern = 0x0, cflags = 0}, word_regex = 0x0, textconv = 0xfe4fd8 astextplain, textconv_cache = 0x0, textconv_want_cache = 0} (gdb) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html