Hi Ævar,

this is really awesome news! I am happy that you choose pcre for git.

>I did some basic performance benchmarks between v1 and v2 of PCRE.
>Depending on whether we use git-grep or git-log v2 is 1% to 10% slower
>than v1 when both use JIT.

I would like to see the compilation flags for both pcre1 and pcre2? By default 
the library compiles without optimization options which has the same effect as 
-O0 option. This could be changed by setting up a CFLAGS value.

Furthermore do pcre2match called only once? Because you free the result of 
pcre2_match_data_create_from_pattern only in free_pcre2_pattern. If pcre2match 
is frequently called, you probably leak memory heavily since each call 
allocates a memory block. The best would be to call this function only once in 
compile_pcre2_pattern.

>And also, searching on that page for "follow-up projects" show some
>areas where I've identified git's PCRE support doing potentially
>stupid things with PCRE, that could be replaced by offloading more
>work on PCRE. E.g. we implement -w by manually checking for word
>boundaries, instead of prefixing & suffixing the pattern with "\b".

This is a difficult question since I don't know the internals of git. Yes, 
/\b(?:PATTERN)\b/ could be used for checking full words unless the PATTERN has 
some exotic features like (*ACCEPT) control verb. The PCRE2_NO_AUTO_CAPTURE can 
be useful if you don't need capturing brackets. Callouts can be used for some 
extreme text searching.

>Any more tips like that would be welcome, and also some tips about
>e.g. in what cases the JIT overhead becomes not worth it, and when it
>does.

I can answer this question. I made some measurements before:

http://sljit.sourceforge.net/pcre.html

There is a Compile time overhead section, which compares the JIT compilation 
overhead to the regular pattern compile. In general if you use a pattern a few 
times on a small input enabling JIT is not worth it. If your input is several 
megabytes or a pattern is frequently used JIT quickly becomes better. Perhaps 
.gitignore patterns could be an example for frequently used patterns.

Regards,
Zoltan


-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to