[pcre-dev] Influence of some start-up optimizations at the beginning of the pattern

2019-05-28 Thread ND via Pcre-dev
Good day! pcre2api.html document: There are also other start-up optimizations. For example, a minimum length for the subject may be recorded. Consider the pattern (*MARK:A)(X|Y) The minimum length for a match is one character. If the subject is "ABC", there will be attempts to match

Re: [pcre-dev] JIT regression

2019-05-28 Thread ND via Pcre-dev
Zoltán Herczeg писал(а) в своём письме Mon, 27 May 2019 11:06:39 +0300: Optimizing the possessive dot is a good idea. I will do it. I feel that interpreter optimizes not only posessive ".*", but any ".*" in DotAll mode. Isn't it? -- ## List details at

Re: [pcre-dev] JIT regression

2019-05-28 Thread ph10
On Tue, 28 May 2019, I wrote: > > ./pcre2test -tm > > PCRE2 version 10.34-RC1 2019-04-22 > >   re> /abcd/ > > data> \[012345678a]{2000} > > Match time 0.1659 milliseconds > > No match > > data> > >   re> /abcd/jit > > data> \[012345678a]{2000} > > Match time 0.0027 milliseconds > > No match > >

[pcre-dev] [Bug 1554] support subject strings with invalid UTF-8 sequences

2019-05-28 Thread admin
https://bugs.exim.org/show_bug.cgi?id=1554 --- Comment #8 from Philip Hazel --- pcre2grep now has a -U or --utf-allow-invalid option, which supports the scanning of files that contain a mixture of valid and invalid UTF-8 code units. -- You are receiving this mail because: You are on the CC

Re: [pcre-dev] JIT regression

2019-05-28 Thread ph10
On Mon, 27 May 2019, Zoltán Herczeg wrote: > that is strategical difference. You don't know the input from the > pattern, and your input has no a-d characters. The interpreter only > searches 'a', while jit searches two characters: 'a' and 'd' which > distance is two. The latter is more