Re: [PATCH v2 7/7] grep: add support for PCRE v2

2017-05-25 Thread Ævar Arnfjörð Bjarmason
On Wed, May 24, 2017 at 8:23 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> Add support for v2 of the PCRE API. This is a new major version of
>> PCRE that came out in early 2015[1].
>>
>> The regular expression syntax is the same, but while the API is
>> similar, pretty much every function is either renamed or takes
>> different arguments. Thus using it via entirely new functions makes
>> sense, as opposed to trying to e.g. have one compile_pcre_pattern()
>> that would call either PCRE v1 or v2 functions.
>>
>> Git can now be compiled with either USE_LIBPCRE1=YesPlease or
>> USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
>> synonym for the former. Providing both is a compile-time error.
>>
>> With earlier patches to enable JIT for PCRE v1 the performance of the
>> release versions of both libraries is almost exactly the same, with
>> PCRE v2 being around 1% slower.
>>
>> However after I reported this to the pcre-dev mailing list[2] I got a
>> lot of help with the API use from Zoltán Herczeg, he subsequently
>> optimized some of the JIT functionality in v2 of the library.
>>
>> Running the p7820-grep-engines.sh performance test against the latest
>> Subversion trunk of both, with both them and git compiled as -O3, and
>> the test run against linux.git, gives the following results. Just the
>> /perl/ tests shown:
>>
>> $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
>> GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 
>> USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc 
>> NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst 
>> LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 
>> USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc 
>> NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
>> LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD 
>> p7820-grep-engines.sh
>> [...]
>> Test   HEAD~2HEAD~   
>>  HEAD
>> 
>> 
>> 7820.3: perl grep 'how.to'  0.22(0.40+0.48)   
>> 0.22(0.31+0.58) +0.0%   0.22(0.26+0.59) +0.0%
>> 7820.7: perl grep '^how to' 0.27(0.62+0.50)   
>> 0.28(0.60+0.50) +3.7%   0.22(0.25+0.60) -18.5%
>> 7820.11: perl grep '[how] to'   0.33(0.92+0.47)   
>> 0.33(0.94+0.45) +0.0%   0.25(0.42+0.51) -24.2%
>> 7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.35(1.08+0.46)   
>> 0.35(1.12+0.41) +0.0%   0.25(0.52+0.50) -28.6%
>> 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.78+0.51)   
>> 0.30(0.86+0.42) +0.0%   0.25(0.29+0.54) -16.7%
>>
>> See commit ("perf: add a comparison test of grep regex engines",
>> 2017-04-19) for details on the machine the above test run was executed
>> on.
>>
>> Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
>> JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
>> mentioning p7820-grep-engines.sh for more details on the test setup.
>>
>> For ease of readability, a different run just of HEAD~ (PCRE v1 with
>> JIT v.s. PCRE v2), again with just the /perl/ tests shown:
>>
>> Test   HEAD~ HEAD
>> 
>> ---
>> 7820.3: perl grep 'how.to'  0.23(0.41+0.47)   
>> 0.23(0.26+0.59) +0.0%
>> 7820.7: perl grep '^how to' 0.27(0.64+0.47)   
>> 0.23(0.28+0.56) -14.8%
>> 7820.11: perl grep '[how] to'   0.34(0.95+0.44)   
>> 0.25(0.38+0.56) -26.5%
>> 7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.34(1.07+0.46)   
>> 0.24(0.52+0.49) -29.4%
>> 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.81+0.46)   
>> 0.22(0.33+0.54) -26.7%
>>
>> I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
>> when it does it's around 20% faster.
>>
>> A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
>> the compiled pattern can be shared between threads, but not some of
>> the JIT context, however the grep threading support does all pattern &
>> JIT compilation in separate threads, so this code doesn't need to
>> concern itself with thread safety.
>
> Nicely explained.
>
>> -# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
>> +# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
>> +# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
>> +# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
>> +# default in future releases.
>> +#
>> +# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
>>  # /foo/bar/include and /foo/bar/lib directories.
>
> As there is no way to use both, having a single LIBPCREDIR is not a
> hurting

Re: [PATCH v2 7/7] grep: add support for PCRE v2

2017-05-23 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Add support for v2 of the PCRE API. This is a new major version of
> PCRE that came out in early 2015[1].
>
> The regular expression syntax is the same, but while the API is
> similar, pretty much every function is either renamed or takes
> different arguments. Thus using it via entirely new functions makes
> sense, as opposed to trying to e.g. have one compile_pcre_pattern()
> that would call either PCRE v1 or v2 functions.
>
> Git can now be compiled with either USE_LIBPCRE1=YesPlease or
> USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
> synonym for the former. Providing both is a compile-time error.
>
> With earlier patches to enable JIT for PCRE v1 the performance of the
> release versions of both libraries is almost exactly the same, with
> PCRE v2 being around 1% slower.
>
> However after I reported this to the pcre-dev mailing list[2] I got a
> lot of help with the API use from Zoltán Herczeg, he subsequently
> optimized some of the JIT functionality in v2 of the library.
>
> Running the p7820-grep-engines.sh performance test against the latest
> Subversion trunk of both, with both them and git compiled as -O3, and
> the test run against linux.git, gives the following results. Just the
> /perl/ tests shown:
>
> $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
> GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 
> USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc 
> NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst 
> LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 
> USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc 
> NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
> LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD 
> p7820-grep-engines.sh
> [...]
> Test   HEAD~2HEAD~
> HEAD
> 
> 
> 7820.3: perl grep 'how.to'  0.22(0.40+0.48)   
> 0.22(0.31+0.58) +0.0%   0.22(0.26+0.59) +0.0%
> 7820.7: perl grep '^how to' 0.27(0.62+0.50)   
> 0.28(0.60+0.50) +3.7%   0.22(0.25+0.60) -18.5%
> 7820.11: perl grep '[how] to'   0.33(0.92+0.47)   
> 0.33(0.94+0.45) +0.0%   0.25(0.42+0.51) -24.2%
> 7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.35(1.08+0.46)   
> 0.35(1.12+0.41) +0.0%   0.25(0.52+0.50) -28.6%
> 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.78+0.51)   
> 0.30(0.86+0.42) +0.0%   0.25(0.29+0.54) -16.7%
>
> See commit ("perf: add a comparison test of grep regex engines",
> 2017-04-19) for details on the machine the above test run was executed
> on.
>
> Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
> JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
> mentioning p7820-grep-engines.sh for more details on the test setup.
>
> For ease of readability, a different run just of HEAD~ (PCRE v1 with
> JIT v.s. PCRE v2), again with just the /perl/ tests shown:
>
> Test   HEAD~ HEAD
> 
> ---
> 7820.3: perl grep 'how.to'  0.23(0.41+0.47)   
> 0.23(0.26+0.59) +0.0%
> 7820.7: perl grep '^how to' 0.27(0.64+0.47)   
> 0.23(0.28+0.56) -14.8%
> 7820.11: perl grep '[how] to'   0.34(0.95+0.44)   
> 0.25(0.38+0.56) -26.5%
> 7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.34(1.07+0.46)   
> 0.24(0.52+0.49) -29.4%
> 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.81+0.46)   
> 0.22(0.33+0.54) -26.7%
>
> I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
> when it does it's around 20% faster.
>
> A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
> the compiled pattern can be shared between threads, but not some of
> the JIT context, however the grep threading support does all pattern &
> JIT compilation in separate threads, so this code doesn't need to
> concern itself with thread safety.

Nicely explained.

> -# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
> +# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
> +# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
> +# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
> +# default in future releases.
> +#
> +# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
>  # /foo/bar/include and /foo/bar/lib directories.

As there is no way to use both, having a single LIBPCREDIR is not a
hurting limitation, which makes sense.

> @@ -2241,6 +2258,7 @@ GIT-BUILD-OPTIONS: FORCE
>   @echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
>

[PATCH v2 7/7] grep: add support for PCRE v2

2017-05-23 Thread Ævar Arnfjörð Bjarmason
Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:

$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 
USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc 
NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 
USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease 
CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD 
p7820-grep-engines.sh
[...]
Test   HEAD~2HEAD~  
  HEAD


7820.3: perl grep 'how.to'  0.22(0.40+0.48)   
0.22(0.31+0.58) +0.0%   0.22(0.26+0.59) +0.0%
7820.7: perl grep '^how to' 0.27(0.62+0.50)   
0.28(0.60+0.50) +3.7%   0.22(0.25+0.60) -18.5%
7820.11: perl grep '[how] to'   0.33(0.92+0.47)   
0.33(0.94+0.45) +0.0%   0.25(0.42+0.51) -24.2%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.35(1.08+0.46)   
0.35(1.12+0.41) +0.0%   0.25(0.52+0.50) -28.6%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.78+0.51)   
0.30(0.86+0.42) +0.0%   0.25(0.29+0.54) -16.7%

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.

For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:

Test   HEAD~ HEAD

---
7820.3: perl grep 'how.to'  0.23(0.41+0.47)   
0.23(0.26+0.59) +0.0%
7820.7: perl grep '^how to' 0.27(0.64+0.47)   
0.23(0.28+0.56) -14.8%
7820.11: perl grep '[how] to'   0.34(0.95+0.44)   
0.25(0.38+0.56) -26.5%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.34(1.07+0.46)   
0.24(0.52+0.49) -29.4%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.81+0.46)   
0.22(0.33+0.54) -26.7%

I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  |  30 +---
 configure.ac  |  77 ++-
 grep.c| 143 ++
 grep.h|  17 +++
 t/test-lib.sh |   2 +-
 5 files changed, 250 insertions(+), 19 deletions(-)

diff --git a/Makefile b/Makefile
index a79274e5e6..d77ca

[PATCH v2 7/7] grep: add support for PCRE v2

2017-05-13 Thread Ævar Arnfjörð Bjarmason
Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:

$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 
USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc 
NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 
USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease 
CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD 
p7820-grep-engines.sh
[...]
Test   HEAD~2HEAD~  
  HEAD


7820.3: perl grep how.to  0.19(0.34+0.62)   
0.18(0.39+0.57) -5.3%   0.19(0.32+0.61) +0.0%
7820.7: perl grep ^how to 0.21(0.68+0.51)   
0.21(0.64+0.54) +0.0%   0.19(0.32+0.60) -9.5%
7820.11: perl grep [how] to   0.25(0.92+0.51)   
0.26(0.93+0.49) +4.0%   0.21(0.39+0.62) -16.0%
7820.15: perl grep (e.t[^ ]*|v.ry) rare   0.26(1.18+0.42)   
0.26(1.14+0.45) +0.0%   0.20(0.49+0.57) -23.1%
7820.19: perl grep m(ú|u)lt.b(æ|y)te  0.24(0.85+0.48)   
0.23(0.92+0.41) -4.2%   0.19(0.36+0.56) -20.8%

See commit ("perf: add a performance comparison test of grep -G, -E
and -P", 2017-04-19) for further details on the machine the above test
run was executed on.

Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.

For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:

Test   HEAD~ HEAD

---
7820.3: perl grep how.to  0.19(0.40+0.56)   
0.19(0.34+0.59) +0.0%
7820.7: perl grep ^how to 0.21(0.64+0.54)   
0.19(0.30+0.63) -9.5%
7820.11: perl grep [how] to   0.25(0.94+0.48)   
0.21(0.38+0.62) -16.0%
7820.15: perl grep (e.t[^ ]*|v.ry) rare   0.26(1.13+0.46)   
0.20(0.48+0.58) -23.1%
7820.19: perl grep m(ú|u)lt.b(æ|y)te  0.23(0.84+0.50)   
0.18(0.29+0.63) -21.7%

I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  |  30 +---
 configure.ac  |  77 ++-
 grep.c| 143 ++
 grep.h|  17 +++
 t/test-lib.sh |   2 +-
 5 files changed, 250 insertions(+), 19 deletions(-)

diff --git a/Makefile b/Makefile
index a79274e5e6..d77ca4c