Re: Test failures in t4034

2012-09-02 Thread Junio C Hamano
Ramsay Jones  writes:

> Yes, there was a net increase in the line count when I introduced
> die(), but the main program flow was less cluttered by error handling.
> The net result looked much better, so I thought it was worth it.
>
> What may not be too obvious, however, is that test-regex.c was written
> to be independent of git.

That part I was very aware of actually; it it is a bit tricky to
tell what the right thing to do, though.  Your test itself needs to
be pretty much portable without the portability help you would get
git-compat-util.h, but the point of this kind of test is to tell if
you want to define preprocessor macros that may affect the behaviour
of such compatibility layer ;-)

> Given that I'm now building it as part of git, I should have simply
> #included  and used the die() routine from libgit.a
> (since I'm now *relying* on test-regex being linked with libgit.a).

OK.

>>> +int main(int argc, char **argv)
>>> +{
>>> +   char *pat = "[^={} \t]+";
>>> +   char *str = "={}\nfred";
>>> +   regex_t r;
>>> +   regmatch_t m[1];
>>> +
>>> +   if (regcomp(&r, pat, REG_EXTENDED | REG_NEWLINE))
>>> +   die("failed regcomp() for pattern '%s'", pat);
>>> +   if (regexec(&r, str, 1, m, 0))
>>> +   die("no match of pattern '%s' to string '%s'", pat, str);
>>> +
>>> +   /* http://sourceware.org/bugzilla/show_bug.cgi?id=3957  */
>>> +   if (m[0].rm_so == 3) /* matches '\n' when it should not */
>>> +   exit(1);
>> 
>> This could be the third call site of die() that tells the user to
>> build with NO_REGEX=1.  Then "cd t && sh t0070-fundamental.sh -i -v" would
>> give that message directly to the user.
>
> Hmm, even without "-i -v", it's *very* clear what is going on, but sure
> it wouldn't hurt either. (Also, I wanted to be able to distinguish an exit
> via die() from a "test failed" error return).

OK.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-09-01 Thread Ramsay Jones
Junio C Hamano wrote:
> Ramsay Jones  writes:
> 

[snip]

>> diff --git a/test-regex.c b/test-regex.c
>> new file mode 100644
>> index 000..9259985
>> --- /dev/null
>> +++ b/test-regex.c
>> @@ -0,0 +1,35 @@
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +static void die(const char *fmt, ...)
>> +{
>> +va_list p;
>> +
>> +va_start(p, fmt);
>> +vfprintf(stderr, fmt, p);
>> +va_end(p);
>> +fputc('\n', stderr);
>> +exit(128);
>> +}
> 
> Looks like a bit of overkill for only two call sites, whose output
> we would never see because it is behind the test, but OK.

Yes, there was a net increase in the line count when I introduced
die(), but the main program flow was less cluttered by error handling.
The net result looked much better, so I thought it was worth it.

What may not be too obvious, however, is that test-regex.c was written
to be independent of git. You should be able to compile the (single) file
on any POSIX system to determine if the system regex routines suffer this
problem. (It was also supposed to be quiet, unless it die()-ed, and
provide the result via the exit code).

Given that I'm now building it as part of git, I should have simply
#included  and used the die() routine from libgit.a
(since I'm now *relying* on test-regex being linked with libgit.a).

>> +int main(int argc, char **argv)
>> +{
>> +char *pat = "[^={} \t]+";
>> +char *str = "={}\nfred";
>> +regex_t r;
>> +regmatch_t m[1];
>> +
>> +if (regcomp(&r, pat, REG_EXTENDED | REG_NEWLINE))
>> +die("failed regcomp() for pattern '%s'", pat);
>> +if (regexec(&r, str, 1, m, 0))
>> +die("no match of pattern '%s' to string '%s'", pat, str);
>> +
>> +/* http://sourceware.org/bugzilla/show_bug.cgi?id=3957  */
>> +if (m[0].rm_so == 3) /* matches '\n' when it should not */
>> +exit(1);
> 
> This could be the third call site of die() that tells the user to
> build with NO_REGEX=1.  Then "cd t && sh t0070-fundamental.sh -i -v" would
> give that message directly to the user.

Hmm, even without "-i -v", it's *very* clear what is going on, but sure
it wouldn't hurt either. (Also, I wanted to be able to distinguish an exit
via die() from a "test failed" error return).

So, new (tested) version of the patch comming.

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-21 Thread Junio C Hamano
Ramsay Jones  writes:

> I think that, after some testing, this (or something like it) is the
> best that we can do. What do you think?
>
> ATB,
> Ramsay Jones
>
> -- >8 --
> Subject: [PATCH] test-regex: Add a test to check for a bug in the regex 
> routines
>
> Signed-off-by: Ramsay Jones 
> ---
> diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
> index 9bee8bf..da2c504 100755
> --- a/t/t0070-fundamental.sh
> +++ b/t/t0070-fundamental.sh
> @@ -25,4 +25,9 @@ test_expect_success POSIXPERM 'mktemp to unwritable 
> directory prints filename' '
>   grep "cannotwrite/test" err
>  '
>  
> +test_expect_success 'check for a bug in the regex routines' '
> + # if this test fails, re-build git with NO_REGEX=1
> + test-regex
> +'

OK.

>  test_done
> diff --git a/test-regex.c b/test-regex.c
> new file mode 100644
> index 000..9259985
> --- /dev/null
> +++ b/test-regex.c
> @@ -0,0 +1,35 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static void die(const char *fmt, ...)
> +{
> + va_list p;
> +
> + va_start(p, fmt);
> + vfprintf(stderr, fmt, p);
> + va_end(p);
> + fputc('\n', stderr);
> + exit(128);
> +}

Looks like a bit of overkill for only two call sites, whose output
we would never see because it is behind the test, but OK.

> +int main(int argc, char **argv)
> +{
> + char *pat = "[^={} \t]+";
> + char *str = "={}\nfred";
> + regex_t r;
> + regmatch_t m[1];
> +
> + if (regcomp(&r, pat, REG_EXTENDED | REG_NEWLINE))
> + die("failed regcomp() for pattern '%s'", pat);
> + if (regexec(&r, str, 1, m, 0))
> + die("no match of pattern '%s' to string '%s'", pat, str);
> +
> + /* http://sourceware.org/bugzilla/show_bug.cgi?id=3957  */
> + if (m[0].rm_so == 3) /* matches '\n' when it should not */
> + exit(1);

This could be the third call site of die() that tells the user to
build with NO_REGEX=1.  Then "cd t && sh t0070-fundamental.sh -i -v" would
give that message directly to the user.

> + exit(0);
> +}

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-21 Thread Ramsay Jones
Junio C Hamano wrote:
> Ramsay Jones  writes:
> 
>> I had the same problem (or at least it *looks* like the same problem) on 
>> Linux
>> last year (May 2011), which turned out to be a bug in the regex routines in 
>> an
>> old version of glibc. 
>>
>> I don't know OS X at all, so this may not be relevent; does OS X use glibc?
>> (I didn't think so, but ...)
>>
>> I sent some patches to the list which may be helpful. I can't get to gmane to
>> look up a reference, but you need to search for:
>>
>> [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh 
>> test failures
>>
>> sent on 3rd May 2011.
> 
> Thanks; that's $gmane/172676 for people who prefer easier to read
> threading interface.
> 
>> Also, in the same thread, a reply to Jonathan Nieder on 7th May contains a
>> test which checks whether your regex routines suffer this bug.
>>
>> These patches were not applied since I didn't think this would be a common
>> problem. I simply set NO_REGEX=1 in my config.mak, since the compat/ regex
>> routines don't suffer from this problem.
> 
> You also said:
> 
>   This is an RFC because:
>- A simple fix would be for me to put NO_REGEX=1 in my config.mak,
>  since the compat/regex routines don't suffer this problem.
>- I suspect this bug is old enough that it will not affect many users.
>- I have not audited the other non-matching list expressions in
>  userdiff.c
>- blame, grep and pickaxe all call regcomp() with the REG_NEWLINE
>  flag, but get the regex from the user (eg from command line).
> 
> I think:
> 
>  - the second "this is old enough" assumption was broken again by
>Brian this week ;-)
> 
>  - the first "Use NO_REGEX if your regexp library is broken" is a
>reasonable thing to do; is this something we may want to throw
>into the platform specific section of the top-level Makefile?
> 
>  - among the fourth, "blame" and "grep" goes line by line, and even
>though pickaxe is primarily meant to take multi-line pattern, I
>do not think people give multi-line pattern when they use it in
>the regexp mode.  So I do not think they pose a real issue even
>though they get an arbitrary pattern from the user.
> 
>  - the third, combined with the fact that end user can define their
>own pattern, is a killer.  We cannot really afford to let broken
>regex library to break us.
> 
> I think a sensible way to go in the longer term, while we wait these
> old regexp libraries die out, is to help people to avoid building
> git without NO_REGEX on platforms where they need it.

Agreed. Did you take a look at the second patch I mentioned above?
I've included a rebased version (onto v1.7.12) of that patch below.

NOTE: I have not even attempted to compile this version of the patch
and I can't remember how much testing I did last year, so this is
included *only* for discussion purposes ...

I think that, after some testing, this (or something like it) is the
best that we can do. What do you think?

ATB,
Ramsay Jones

-- >8 --
Subject: [PATCH] test-regex: Add a test to check for a bug in the regex routines

Signed-off-by: Ramsay Jones 
---
 .gitignore |  1 +
 Makefile   |  1 +
 t/t0070-fundamental.sh |  5 +
 test-regex.c   | 35 +++
 4 files changed, 42 insertions(+)
 create mode 100644 test-regex.c

diff --git a/.gitignore b/.gitignore
index bb5c91e..68fe464 100644
--- a/.gitignore
+++ b/.gitignore
@@ -189,6 +189,7 @@
 /test-mktemp
 /test-parse-options
 /test-path-utils
+/test-regex
 /test-revision-walking
 /test-run-command
 /test-sha1
diff --git a/Makefile b/Makefile
index 6b0c961..3b760d3 100644
--- a/Makefile
+++ b/Makefile
@@ -496,6 +496,7 @@ TEST_PROGRAMS_NEED_X += test-mergesort
 TEST_PROGRAMS_NEED_X += test-mktemp
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
+TEST_PROGRAMS_NEED_X += test-regex
 TEST_PROGRAMS_NEED_X += test-revision-walking
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 9bee8bf..da2c504 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -25,4 +25,9 @@ test_expect_success POSIXPERM 'mktemp to unwritable directory 
prints filename' '
grep "cannotwrite/test" err
 '
 
+test_expect_success 'check for a bug in the regex routines' '
+   # if this test fails, re-build git with NO_REGEX=1
+   test-regex
+'
+
 test_done
diff --git a/test-regex.c b/test-regex.c
new file mode 100644
index 000..9259985
--- /dev/null
+++ b/test-regex.c
@@ -0,0 +1,35 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void die(const char *fmt, ...)
+{
+   va_list p;
+
+   va_start(p, fmt);
+   vfprintf(stderr, fmt, p);
+   va_end(p);
+   fputc('\n', stderr);
+   exit(128);
+}
+
+int main(int argc, char **argv)
+{
+   char *pat = "[^={} \t]+";
+   char *str = "=

Re: Test failures in t4034

2012-08-19 Thread Junio C Hamano
Ramsay Jones  writes:

> I had the same problem (or at least it *looks* like the same problem) on Linux
> last year (May 2011), which turned out to be a bug in the regex routines in an
> old version of glibc. 
>
> I don't know OS X at all, so this may not be relevent; does OS X use glibc?
> (I didn't think so, but ...)
>
> I sent some patches to the list which may be helpful. I can't get to gmane to
> look up a reference, but you need to search for:
>
> [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh test 
> failures
>
> sent on 3rd May 2011.

Thanks; that's $gmane/172676 for people who prefer easier to read
threading interface.

> Also, in the same thread, a reply to Jonathan Nieder on 7th May contains a
> test which checks whether your regex routines suffer this bug.
>
> These patches were not applied since I didn't think this would be a common
> problem. I simply set NO_REGEX=1 in my config.mak, since the compat/ regex
> routines don't suffer from this problem.

You also said:

  This is an RFC because:
   - A simple fix would be for me to put NO_REGEX=1 in my config.mak,
 since the compat/regex routines don't suffer this problem.
   - I suspect this bug is old enough that it will not affect many users.
   - I have not audited the other non-matching list expressions in
 userdiff.c
   - blame, grep and pickaxe all call regcomp() with the REG_NEWLINE
 flag, but get the regex from the user (eg from command line).

I think:

 - the second "this is old enough" assumption was broken again by
   Brian this week ;-)

 - the first "Use NO_REGEX if your regexp library is broken" is a
   reasonable thing to do; is this something we may want to throw
   into the platform specific section of the top-level Makefile?

 - among the fourth, "blame" and "grep" goes line by line, and even
   though pickaxe is primarily meant to take multi-line pattern, I
   do not think people give multi-line pattern when they use it in
   the regexp mode.  So I do not think they pose a real issue even
   though they get an arbitrary pattern from the user.

 - the third, combined with the fact that end user can define their
   own pattern, is a killer.  We cannot really afford to let broken
   regex library to break us.

I think a sensible way to go in the longer term, while we wait these
old regexp libraries die out, is to help people to avoid building
git without NO_REGEX on platforms where they need it.

Thanks for digging an old article.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-19 Thread Johannes Sixt
Am 19.08.2012 16:50, schrieb Ramsay Jones:
> Brian Gernhardt wrote:
>> I've been getting a couple of test failures and finally had the
>> time to track them down.
>> 
>> t4034-diff-words fails tests "22 diff driver 'bibtex'" and "26 diff
>> driver 'html'".  Bisecting shows that the file started giving me
>> errors in commit 8d96e72 "t4034: bulk verify builtin word regex
>> sanity", which appears to introduce those tests.  I don't see
>> anything obviously wrong with the tests and I'm not familiar with
>> the diff-words code, so I'm not sure what's wrong.
>> 
>> I am running on OS X 10.8, with Xcode 4.4.1 (llvm-gcc 4.2.1).
> 
> I had the same problem (or at least it *looks* like the same problem)
> on Linux last year (May 2011), which turned out to be a bug in the
> regex routines in an old version of glibc.

I also had the same problem, but did not remember why I don't have it
anymore. Now that you mention it: It was the same situation and I came
to the same conclusion (old glibc, bogus regex implementation). I worked
it around with NO_REGEX=YesPlease in config.mak.

-- Hannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-19 Thread Ramsay Jones
Brian Gernhardt wrote:
> I've been getting a couple of test failures and finally had the time to track 
> them down.
> 
> t4034-diff-words fails tests "22 diff driver 'bibtex'" and "26 diff driver 
> 'html'".  Bisecting shows that the file started giving me errors in commit 
> 8d96e72 "t4034: bulk verify builtin word regex sanity", which appears to 
> introduce those tests.  I don't see anything obviously wrong with the tests 
> and I'm not familiar with the diff-words code, so I'm not sure what's wrong.
> 
> I am running on OS X 10.8, with Xcode 4.4.1 (llvm-gcc 4.2.1).

I had the same problem (or at least it *looks* like the same problem) on Linux
last year (May 2011), which turned out to be a bug in the regex routines in an
old version of glibc. 

I don't know OS X at all, so this may not be relevent; does OS X use glibc?
(I didn't think so, but ...)

I sent some patches to the list which may be helpful. I can't get to gmane to
look up a reference, but you need to search for:

[RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh test 
failures

sent on 3rd May 2011.

Also, in the same thread, a reply to Jonathan Nieder on 7th May contains a
test which checks whether your regex routines suffer this bug.

These patches were not applied since I didn't think this would be a common
problem. I simply set NO_REGEX=1 in my config.mak, since the compat/ regex
routines don't suffer from this problem.

HTH

ATB,
Ramsay Jones

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-19 Thread Junio C Hamano
Brian Gernhardt  writes:

> I wonder if there is something non-portable about the bibtex
> filter?  (The HTML filter as well, since that errors on my machine
> too.)

Yeah, that I didn't think of, but is a possibility (part of (1)
above).

The HTML one is "[^<>= \t]+" and
the Bibtex one is "[={}\"]|[^={}\" \t]+"

and both will be used with "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+"
appended and given to regcomp with REG_EXTENDED|REG_NEWLINE.  

Nothing jumps at me that is common to these two but not shared by
other patterns.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-08-18 Thread Junio C Hamano
Brian Gernhardt  writes:

> I've been getting a couple of test failures and finally had the time to track 
> them down.
>
> t4034-diff-words fails tests "22 diff driver 'bibtex'" and "26
> diff driver 'html'".  Bisecting shows that the file started giving
> me errors in commit 8d96e72 "t4034: bulk verify builtin word regex
> sanity", which appears to introduce those tests.  I don't see
> anything obviously wrong with the tests and I'm not familiar with
> the diff-words code, so I'm not sure what's wrong.
>
> I am running on OS X 10.8, with Xcode 4.4.1 (llvm-gcc 4.2.1).
>
> Test results follow:
>
> -- 8< --
>
> expecting success: 
>   cp "$TEST_DIRECTORY/t4034/bibtex/pre" \
>   "$TEST_DIRECTORY/t4034/bibtex/post" \
>   "$TEST_DIRECTORY/t4034/bibtex/expect" . &&
>   echo "* diff=bibtex" >.gitattributes &&
>   word_diff --color-words
>   
> --- expect2012-08-18 05:54:29.0 +
> +++ output.decrypted  2012-08-18 05:54:29.0 +
> @@ -8,8 +8,8 @@
>author={Aldous, D.David},
>journal={Information Theory, IEEE Transactions on},
>volume={33Bogus.},
> -  number={24},
> +  number={4},
>pages={219--223},
> -  year=1987,
> -  note={This is in fact a rather funny read since ethernet works well 
> in practice. The {1987\em pre} reference is the 
> right one, however.},
> +  year={1987},1987,
> +  note={This is in fact a rather funny read since ethernet works well in 
> practice. The {\em pre} reference is the right one, however.}
>  }
> not ok - 22 diff driver 'bibtex'

Thanks for a report.  Off the top of my head, there may be three
possibilities.

 (1) The compiled binary of Git is broken on your platform and not
 formatting the escape sequence correctly.  I somehow think it
 is very unlikely, as the code to do so is pretty much platform
 agonistic (color.c does not use anything fancy from system
 libraries).

 (2) The test script, the part that converts the escape sequence to
 human readable form, is broken---not written in a portable awk.

 (3) The implementation of awk on your platform was broken by your
 supplier, with the same infinite wisdom they broke the UTF-8
 pathnames on their filesystem implementation with ;-)

Can you help isolating the issue first to see if it is (1) or one of
the other two?

Run "cd t && sh t4034-diff-words -i" to force stop the test upon the
first breakage, and inspect the "output" before the awk script
test_decode_color munges it.  Does it show a red number 2 and green
number 4 on the line that begins with "number=" (or if you have an
access to a box on which this test passes, grab the raw output from
it by running this test, and make byte-for-byte comparison)?  
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Test failures in t4034

2012-08-17 Thread Brian Gernhardt
I've been getting a couple of test failures and finally had the time to track 
them down.

t4034-diff-words fails tests "22 diff driver 'bibtex'" and "26 diff driver 
'html'".  Bisecting shows that the file started giving me errors in commit 
8d96e72 "t4034: bulk verify builtin word regex sanity", which appears to 
introduce those tests.  I don't see anything obviously wrong with the tests and 
I'm not familiar with the diff-words code, so I'm not sure what's wrong.

I am running on OS X 10.8, with Xcode 4.4.1 (llvm-gcc 4.2.1).

Test results follow:

-- 8< --

expecting success: 
cp "$TEST_DIRECTORY/t4034/bibtex/pre" \
"$TEST_DIRECTORY/t4034/bibtex/post" \
"$TEST_DIRECTORY/t4034/bibtex/expect" . &&
echo "* diff=bibtex" >.gitattributes &&
word_diff --color-words

--- expect  2012-08-18 05:54:29.0 +
+++ output.decrypted2012-08-18 05:54:29.0 +
@@ -8,8 +8,8 @@
   author={Aldous, D.David},
   journal={Information Theory, IEEE Transactions on},
   volume={33Bogus.},
-  number={24},
+  number={4},
   pages={219--223},
-  year=1987,
-  note={This is in fact a rather funny read since ethernet works well 
in practice. The {1987\em pre} reference is the right 
one, however.},
+  year={1987},1987,
+  note={This is in fact a rather funny read since ethernet works well in 
practice. The {\em pre} reference is the right one, however.}
 }
not ok - 22 diff driver 'bibtex'

-- 8< --

expecting success: 
cp "$TEST_DIRECTORY/t4034/html/pre" \
"$TEST_DIRECTORY/t4034/html/post" \
"$TEST_DIRECTORY/t4034/html/expect" . &&
echo "* diff=html" >.gitattributes &&
word_diff --color-words

--- expect  2012-08-18 05:54:29.0 +
+++ output.decrypted2012-08-18 05:54:29.0 +
@@ -4,5 +4,5 @@
 +++ b/post
 @@ -1,3 +1,3 @@
 newattr="newvalue">added content
-"value""newvalue">contentchanged
-content 
&entity;&newentity;
+changed
+content &newentity;
not ok - 26 diff driver 'html'--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html