On Mon, Apr 10, 2017 at 4:13 PM, Zoltán Herczeg <hzmes...@freemail.hu> wrote:
> Hi Ævar,
>
> this is really awesome news! I am happy that you choose pcre for git.
>
>>I did some basic performance benchmarks between v1 and v2 of PCRE.
>>Depending on whether we use git-grep or git-log v2 is 1% to 10% slower
>>than v1 when both use JIT.
>
> I would like to see the compilation flags for both pcre1 and pcre2? By 
> default the library compiles without optimization options which has the same 
> effect as -O0 option. This could be changed by setting up a CFLAGS value.

I'm using the Debian testing packages for both:

    $ dpkg -l | grep -e pcre2-8-0 -e libpcre3:|awk '{print $3}'
    10.22-3
    2:8.39-3

I couldn't find out how to get the compile flags for those, but
presumably it's some comparable middle-of-the-road value, probably
-O2. I'll try compiling from svn & report back.

> Furthermore do pcre2match called only once? Because you free the result of 
> pcre2_match_data_create_from_pattern only in free_pcre2_pattern. If 
> pcre2match is frequently called, you probably leak memory heavily since each 
> call allocates a memory block. The best would be to call this function only 
> once in compile_pcre2_pattern.

Oops, no it's called lots of times. I'll fix that. That's probably the
source of some overhead, and definitely memory leaks.

>>And also, searching on that page for "follow-up projects" show some
>>areas where I've identified git's PCRE support doing potentially
>>stupid things with PCRE, that could be replaced by offloading more
>>work on PCRE. E.g. we implement -w by manually checking for word
>>boundaries, instead of prefixing & suffixing the pattern with "\b".
>
> This is a difficult question since I don't know the internals of git. Yes, 
> /\b(?:PATTERN)\b/ could be used for checking full words unless the PATTERN 
> has some exotic features like (*ACCEPT) control verb. The 
> PCRE2_NO_AUTO_CAPTURE can be useful if you don't need capturing brackets. 
> Callouts can be used for some extreme text searching.

*Nod* will experiment with that.

>>Any more tips like that would be welcome, and also some tips about
>>e.g. in what cases the JIT overhead becomes not worth it, and when it
>>does.
>
> I can answer this question. I made some measurements before:
>
> http://sljit.sourceforge.net/pcre.html

That's great, I'll use some of those regexes for my performance test.

> There is a Compile time overhead section, which compares the JIT compilation 
> overhead to the regular pattern compile. In general if you use a pattern a 
> few times on a small input enabling JIT is not worth it. If your input is 
> several megabytes or a pattern is frequently used JIT quickly becomes better. 
> Perhaps .gitignore patterns could be an example for frequently used patterns.
>
> Regards,
> Zoltan
>

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to