from:"ND via Pcre\-dev"

[pcre-dev] Memory overflow when using replace modifier upon pattern with \K in lookbehind assertion

2018-05-30 Thread ND via Pcre-dev

Good day, Here is pcre2test output: PCRE2 version 10.31 2018-02-12 /(?<=\K.)/g,replace=- ab Failed: error -48: no more memory It seems like a bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] Unexpected result when zero-length global matching and pattern contains \G

2018-05-30 Thread ND via Pcre-dev

Good day, Here is pcre2test output: PCRE2 version 10.31 2018-02-12 /(?<=\G.)/g,replace=- abc 2: a-bc- Logically expected result: a-b-c- PCRE advances by one character between zero-length matches. But it seems it should not in this case. -- ## List details at https://lists.exim.org/ma

Re: [pcre-dev] Memory overflow when using replace modifier upon pattern with \K in lookbehind assertion

2018-06-22 Thread ND via Pcre-dev

The fact of error confusing me because there is no error happens without "replace": PCRE2 version 10.31 2018-02-12 /(?<=\K.)/g ab 0: a 0: b -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] Anchored patterns doesn't match with \K in lookbehind

2018-06-24 Thread ND via Pcre-dev

Good day. Here is pcretest output: PCRE2 version 10.31 2018-02-12 /(?<=\K.)/anchored ab No match As documented, "the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched". Match "a" at first matchi

Re: [pcre-dev] Anchored patterns doesn't match with \K in lookbehind

2018-06-30 Thread ND via Pcre-dev

nd precede it for example: (?<=\Kfoo)bar matches bar but report that match foobar On 2018-06-25 08:21, ph10 wrote: On Mon, 25 Jun 2018, ND via Pcre-dev wrote: PCRE2 version 10.31 2018-02-12 > /(?<=\K.)/anchored > ab > No match >> As documented, "the pattern

Re: [pcre-dev] Unexpected result when zero-length global matching and pattern contains \G

2018-06-30 Thread ND via Pcre-dev

However, the match was for an empty string, so it moves on one character. Explain please why an empty match causes move a starting offset by one character. I can imagine that starting offset need to move by one character if there is an empty match AND startoffset argument of thi

Re: [pcre-dev] Unexpected result when zero-length global matching and pattern contains \G

2018-06-30 Thread ND via Pcre-dev

On 2018-06-30 16:51, ph10 wrote: Note that this effect is in pcre2test, not in the basic library. /g is implemented entirely in pcre2test, so the basic library does not know it is handling a repeat match. I'm surprised: is there another algorithm of working /g in pcretest in comparing wit

Re: [pcre-dev] Unexpected result when zero-length global matching and pattern contains \G

2018-06-30 Thread ND via Pcre-dev

On 2018-06-30 17:44, ND wrote: I'm surprised: is there another algorithm of working /g in pcretest in comparing with PCRE2_SUBSTITUTE_GLOBAL in pcre2_substitute()? Sorry for my bad English. A want to ask: aren't /g in pcretest and pcre2_substitute() with PCRE2_SUBSTITUTE_GLOBAL

[pcre-dev] Quantified assertions inside lookbehind assertion cause error

2018-07-01 Thread ND via Pcre-dev

Good day. Some time ago we discuss that a quantified assertions can be used: https://lists.exim.org/lurker/message/20110502.103121.96e51b9d.en.html Here is pcretest listing: PCRE2 version 10.31 2018-02-12 /(?<=(?=.)?)/ Failed: error 125 at offset 0: lookbehind assertion is not fixed length a

[pcre-dev] Checking the PCRE2 version not work as expected

2018-07-01 Thread ND via Pcre-dev

Hi! pcretest listing: PCRE2 version 10.31 2018-02-12 /(?(VERSION>=10.04)yes|no)/ yes No match /(?(VERSION>=10.4)yes|no)/ yes No match I expect match in both cases. If it shouldn't match then document please this behaviour. Or point me please if this is already explaned in docs. -- ## L

[pcre-dev] Bactracking controls in subroutines

2018-07-03 Thread ND via Pcre-dev

Good day. Look to pcretest listing: PCRE2 version 10.31 2018-02-12 /(?1)(*F)|(a(*COMMIT))/ a 0: a 1: a In Perl this pattern not matched. There are differences in how PCRE and Perl act with control verbs in subroutines. This is documented in PCRE: "Perl's treatment of subroutines is diff

[pcre-dev] No capture in nested negative assertions

2018-07-06 Thread ND via Pcre-dev

Good day. PCRE documents: "No capturing is done for a negative assertion unless it is being used as a condition in a conditional subpattern (see the discussion below). Matching continues after a non-conditional negative assertion only if all its branches fail to match." But capture al

Re: [pcre-dev] Quantified assertions inside lookbehind assertion cause error

2018-07-07 Thread ND via Pcre-dev

On 2018-07-02 11:27, ph10 wrote: I don't know why Perl allows qualifiers on assertions, because they don't mean anything. No. It have meaning. In my first post there is a link to thread when we discuss about that. And after that discussion you change PCRE to support assertion quantifiers.

Re: [pcre-dev] No capture in nested negative assertions

2018-07-08 Thread ND via Pcre-dev

On 2018-07-07 16:50, ph10 wrote: I decided that the most straightforward approach was to discard all capturing inside negative assertions when the assertion completes. May I suggest alternative approach? It is simple and more consistent. I think Perl use it: Capture is discarded ONLY if i

[pcre-dev] (SKIP:NAME) when (MARK:NAME) is in assertion

2018-07-08 Thread ND via Pcre-dev

Good day. PCRE documents about SKIP verb with NAME: "When (*SKIP) has an associated name, its behaviour is modified. When it is triggered, the previous path through the pattern is searched for the most recent (*MARK) that has the same name. If one is found, the "bumpalong" advance is to t

Re: [pcre-dev] No capture in nested negative assertions

2018-07-09 Thread ND via Pcre-dev

On 2018-07-09 09:25, ph10 wrote: If any branch in a negative assertion succeeds, the captures are (temporarily) kept, but as the whole assertion now fails, there is an external backtrack, which discards the captures. To what point backtracking is? I guess Perl doesn't backtrack if last alter

Re: [pcre-dev] No capture in nested negative assertions

2018-07-09 Thread ND via Pcre-dev

On 2018-07-10 04:48, ND wrote: On 2018-07-09 09:25, ph10 wrote: >If any branch in a negative assertion succeeds, the captures are> (temporarily) kept, but as the whole assertion now fails, there is an> external backtrack, which discards the captures. > To what point backtracking is? I guess

Re: [pcre-dev] No capture in nested negative assertions

2018-07-10 Thread ND via Pcre-dev

On 2018-07-10 11:31, ph10 wrote: Perl 5.026002 Regular Expressions /(?!(a)b)/ a 0: 1: a /(?!(a)b|ac)/ a 0:/(?!ac|(a)b)/ a 0: It seems to save the capture only if there is just one branch in the assertion. Or maybe it has some algorithm for deciding on which branchto try first ... I don't k

Re: [pcre-dev] (SKIP:NAME) when (MARK:NAME) is in assertion

2018-07-11 Thread ND via Pcre-dev

On 2018-07-11 16:27, ph10 wrote: This already appears in the docs: However, when one of these verbs appears inside an atomic group or in an assertion that is true, its effect is confined to that group, because once the group has been matched, there is never any backtracking into it. I s

Re: [pcre-dev] (SKIP:NAME) when (MARK:NAME) is in assertion

2018-07-12 Thread ND via Pcre-dev

On 2018-07-12 07:25, ph10 wrote: The (*MARK) is inside the assertion. That is what matters. I haveupdated the documentation to say this: The search for a (*MARK) name uses the normal backtracking mechanism, which means that it does not see (*MARK) settings that are inside atomic groups or

Re: [pcre-dev] Bactracking controls in subroutines

2018-07-12 Thread ND via Pcre-dev

On 2018-07-12 16:55, ph10 wrote: There are no subroutine calls in the second, so it looks like a Perl bug. You are right. It's a bug. I will report it. Can you please also report about Perl inconsistence that we discuss in "No capture in nested negative assertions"? -- ## List details

Re: [pcre-dev] (SKIP:NAME) when (MARK:NAME) is in assertion

2018-07-13 Thread ND via Pcre-dev

On 2018-07-13 07:23, ph10 wrote: On Thu, 12 Jul 2018, ND via Pcre-dev wrote: And one more thing should also be clarified in docs: > MARK name unlike MARK position is saved outside assertion or atomic group: The MARK position *is* saved; it's just that there is never a backtrack

Re: [pcre-dev] (SKIP:NAME) when (MARK:NAME) is in assertion

2018-07-13 Thread ND via Pcre-dev

On 2018-07-13 16:08, ph10 wrote: When SKIP has a name, it backtracks until it hits a MARK with the same name. Why it need to backtrack? Why not do a "bumpalong" advance to the next starting character strait away? -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] (SKIP:NAME) when (MARK:NAME) is in assertion

2018-07-14 Thread ND via Pcre-dev

On 2018-07-14 07:16, ph10 wrote: >> Why it need to backtrack? > Why not do a "bumpalong" advance to the next starting character strait away? It has to backtrack to the *MARK because that is where the bumpalongdata is remembered. There may be many *MARKs, each with a differentname. You can'

Re: [pcre-dev] (SKIP:NAME) when (MARK:NAME) is in assertion

2018-07-14 Thread ND via Pcre-dev

On 2018-07-14 15:12, ph10 wrote: Feel free to look at the code and suggest patches. However, I don'tthink is is easy. Sorry. I'm not С programmer. It doesn't have to do anything special when it passes a (*MARK:NAME) other than record a backtracking point. Then when (*SKIP:NAME) is trigge

Re: [pcre-dev] (SKIP:NAME) when (MARK:NAME) is in assertion

2018-07-15 Thread ND via Pcre-dev

And one more possibly bug: PCRE2 version 10.31 2018-02-12 /(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/ abc 0: bc If MARK in atomic don't matter for SKIP then why result is "bc" and not "abc"? If MARK in atomic matter for SKIP then why result is not "c"? -- ## List details at https://lists.exim.org

[pcre-dev] Incompatibility with different names for subpatterns of the same number

2018-07-21 Thread ND via Pcre-dev

Good day. I meet incompatibility with Perl when trying to use in PCRE valid Perl pattern: PCRE2 version 10.31 2018-02-12 /(?|(?)|(?))/ Failed: error 165 at offset 15: different names for subpatterns of the same number are not allowed a Docs say: "Warning: You cannot use different names to

Re: [pcre-dev] Incompatibility with different names for subpatterns of the same number

2018-07-21 Thread ND via Pcre-dev

On 2018-07-21 16:29, ph10 wrote: The feature was added by creating a table that translates a group number to a group name. This means that for each number, there must only be one name. May be a table that translates a group name to group number can be more useful? -- ## List details at htt

Re: [pcre-dev] Incompatibility with different names for subpatterns of the same number

2018-07-22 Thread ND via Pcre-dev

On 2018-07-21 16:59, ph10 wrote: Its the same table. See pcre2api documentation on PCRE2_INFO_NAMETABLE. Thanks. I read it. Why PCRE2_INFO_NAMETABLE entries can't have same number? I see no drawbacks of this. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Incompatibility with different names for subpatterns of the same number

2018-07-23 Thread ND via Pcre-dev

On 2018-07-22 15:09, ph10 wrote: Consider /(?| (?foo) | (?bar) )/x The table will tell you "group 1 is called A" and "group 1 is called B". What happens if you match the pattern with "foo" and then ask "what is the value of group B?". The table will tell you that group B is group 1, and grou

[pcre-dev] AllAny* is slow with JIT

2019-05-25 Thread ND via Pcre-dev

Good day! Let's execute pcre2test -tm 1000 input.txt output.txt input.txt contains two copies of same pattern/subject. But second compiled with JIT: /a.*/s a\[0123456789]{20} /a.*/s,jit a\[0123456789]{20} Here is output.txt (capture is excluded): PCRE2 version 10.33 2019-04-16

[pcre-dev] JIT regression

2019-05-26 Thread ND via Pcre-dev

Good day! Here is another pcre2test timings: PCRE2 version 10.33 2019-04-16 /abcd/ \[0123456789]{20} Match time 0.1040 milliseconds No match /abcd/jit \[0123456789]{20} Match time 1.6320 milliseconds No match It seems JIT is 16 times!! slower than interpreter for such simple patter

Re: [pcre-dev] JIT regression

2019-05-28 Thread ND via Pcre-dev

Zoltán Herczeg писал(а) в своём письме Mon, 27 May 2019 11:06:39 +0300: Optimizing the possessive dot is a good idea. I will do it. I feel that interpreter optimizes not only posessive ".*", but any ".*" in DotAll mode. Isn't it? -- ## List details at https://lists.exim.org/mailman/listi

[pcre-dev] Influence of some start-up optimizations at the beginning of the pattern

2019-05-28 Thread ND via Pcre-dev

Good day! pcre2api.html document: There are also other start-up optimizations. For example, a minimum length for the subject may be recorded. Consider the pattern (*MARK:A)(X|Y) The minimum length for a match is one character. If the subject is "ABC", there will be attempts to match "ABC

Re: [pcre-dev] Influence of some start-up optimizations at the beginning of the pattern

2019-05-29 Thread ND via Pcre-dev

Since anybody put MARK verb at the beginning of pattern then it is assumed that this verb is definitely needed in pattern logic. So is there any reason to apply to such patterns optimizations that can discard that MARK? May be automatically disabling of such optimizations is reasonable. --

Re: [pcre-dev] Influence of some start-up optimizations at the beginning of the pattern

2019-05-29 Thread ND via Pcre-dev

On 2019-05-29 16:52, ph10 wrote: On Wed, 29 May 2019, ND via Pcre-dev wrote: Since anybody put MARK verb at the beginning of pattern then it is assumed > that this verb is definitely needed in pattern logic. But maybe only for successful matches? So is there any reason to apply to s

[pcre-dev] Quantifying backtracking verbs

2019-06-04 Thread ND via Pcre-dev

Good day! Here is pcre2test listing: PCRE2 version 10.33 2019-04-16 /A(?:(*ACCEPT))?B/ A No match /A(?:(*ACCEPT))?B/no_start_optimize A 0: A /A(*ACCEPT)?B/ Failed: error 109 at offset 10: quantifier does not follow a repeatable item A I have a two questions with it: 1. Start optimizer

[pcre-dev] Subject length lower bound calculation

2019-06-05 Thread ND via Pcre-dev

Good day! pcre2test: PCRE2 version 10.33 2019-04-16 /(?=abc)/I Capture group count = 0 May match empty string First code unit = 'a' Last code unit = 'c' Subject length lower bound = 0 Why Subject length lower bound = 0, not 3? -- ## List details at https://lists.exim.org/mailman/listinfo/p

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-05 Thread ND via Pcre-dev

Repetition is allowed for groups such as (?:...) but not for individual backtracking verbs It seems Perl does not rise error with "(*ACCEPT)??". And generates expected code. Is there weighty reason to be not compatible with Perl in this situation? (for which it is meaningless). It's n

Re: [pcre-dev] Subject length lower bound calculation

2019-06-05 Thread ND via Pcre-dev

On 2019-06-05 08:16, ph10 wrote: Because PCRE2 isn't clever enough to deal with lookarounds whencomputing the minimum length. May be there is a some space for optimization there. PCRE analyze subpattern in lookaround and say: First code unit = 'a' Last code unit = 'c' So it already knows t

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-05 Thread ND via Pcre-dev

On 2019-06-05 16:53, ph10 wrote: Perl gets it wrong: /(a(?:(*ACCEPT))??bc)/ axy No match /a(*ACCEPT)??bc/ axy No match It seems a bug of Perl start optimizer. It say: "Did not find floating substr "bc"... Match rejected by optimizer" Please look at PCRE start optimizer. It seems correction n

[pcre-dev] (*MARK) not work in conditions

2019-06-06 Thread ND via Pcre-dev

Good day! pcre2test unlike Perl don't report MARK value that is insight a successful condition of condition group. PCRE2 version 10.33 2019-04-16 /a(?(?=(*:1)b).)/mark ab 0: ab May be this incompatibility should be fixed. Thank you. -- ## List details at https://lists.exim.org/mailman

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-15 Thread ND via Pcre-dev

On 2019-06-10 16:47, ph10 wrote: I have done this, and committed the result. However, it seems to me that /a(*ACCEPT)??bc/ is the same as a(?:bc|) though if a, b, and c are complex it may be easier to read. A following example was included in docs (pcre2pattern.html) : A(*ACCEPT)??BC But

[pcre-dev] Clearing documentation about infinite loops

2019-06-16 Thread ND via Pcre-dev

Good day! Docs says: It is possible to construct infinite loops by following a group that can match no characters with a quantifier that has no upper limit, for example: (a?)* Earlier versions of Perl and PCRE1 used to give an error at compile time for such patterns. However, because t

[pcre-dev] Document SKIP position before or equal start_offset

2019-06-17 Thread ND via Pcre-dev

Good day! I don't find in docs behaviour of SKIP when corresponding position is before or equal start_offset. It seems that in this case a "bumpalong" advance is 1, not SKIP or associated MARK position. /(?<=a(*SKIP)x)|c/ abcd\=offset=2 No match /(*SKIP)x|c/ abcd No match /(?<=a(*SKIP

[pcre-dev] Possessive quantifier not work after {1}

2019-06-17 Thread ND via Pcre-dev

Good day! Here is pcre2test listing: PCRE2 version 10.33 2019-04-16 /(?:a|ab){1}+c/ abc 0: abc No match expected, but pattern matched. Thanks. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-17 Thread ND via Pcre-dev

On 2019-06-04 16:59, ND wrote: 1. Start optimizer brakes a result to "no match" from "match". Is there documented (I remember only example with (*COMMIT) where optimizer can make "match" from "no match")? May be there is a way to correct this PCRE optimization to not break a result. I don'

[pcre-dev] Typo in pcre2test docs about partial match

2019-06-17 Thread ND via Pcre-dev

Hello! In pcre2test docs in chapter RESTARTING AFTER A PARTIAL MATCH there is example: data> 23ja\=P,dfa What matching option "P" is? May be it should be corrected to "ph" or "ps"? Thanks. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] Supplement docs about partial match

2019-06-17 Thread ND via Pcre-dev

Hello! Chapter ISSUES WITH MULTI-SEGMENT MATCHING of pcre2partial.html includes item 2 with description how to process with lookbehind assertions. I think it's important to add to this algorithm a some words about "no match": If result of partial match is "no match" then last max_lookbehin

[pcre-dev] Max lookbehind calculation

2019-06-17 Thread ND via Pcre-dev

Hello! Here is pcre2test listing: PCRE2 version 10.33 2019-04-16 /(?<=.{2}(?<=.{6}))/info Capture group count = 0 Max lookbehind = 6 May match empty string Subject length lower bound = 0 abc\=ph No match Expected maxlookbehind=4, not 6. May be calculation algorithm could be corrected. Th

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-17 Thread ND via Pcre-dev

It seems you don't understand or I don't. Sorry for my bad English. I don't ask to calculate real subject_length_lower_bound in patterns with ACCEPT. I ask to set subject_length_lower_bound to 0 in all such patterns. On 2019-06-17 15:07, ph10 wrote: If a pattern contains (*ACCEPT) the code

Re: [pcre-dev] Max lookbehind calculation

2019-06-17 Thread ND via Pcre-dev

On 2019-06-17 15:44, ph10 wrote: Why do you expect 4? The matcher goes back 2, then matches two characters, so it is back at the start. Then it goes back 6. You are right, Philip. My fault. I'm sorry. Close thread. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-18 Thread ND via Pcre-dev

(*ACCEPT) can't leave lookaround borders. So ACCEPT's that are inside lookarounds can't influence minimum length claculation, if lookaround entrails are not participate in this calculation (is this true?). Thus more preferable may be to turn off minimum length scan not for all patterns t

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-19 Thread ND via Pcre-dev

On 2019-06-19 17:15, ph10 wrote: At present, lookarounds do not take part in minimum length calculations, I see lookarounds takes part: first and last code units are searched in lookarounds too. So this is another reason in opposition to my poroposal. So I suggest to close this thread. --

[pcre-dev] Some words about assertion docs

2019-06-19 Thread ND via Pcre-dev

Good day! In ASSERTIONS chapter I can't find words that assertions are atomic. This information can be seen much far for this chapter in backtracking control verbs part. It can be important IMHO to put this info in ASSERTIONS chapter. But why assertions are atomic? I guess answer is: "B

Re: [pcre-dev] Some words about assertion docs

2019-06-19 Thread ND via Pcre-dev

On 2019-06-19 20:00, Zoltán Herczeg wrote: Assertions are like "if" statements in structured languages. A condition part of an "if" is never retried. (?=x|y) looks much more ergonomical than (?:(?=x)|(?=y)) -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Quantifying backtracking verbs

2019-06-20 Thread ND via Pcre-dev

On 2019-06-20 15:53, ph10 wrote: I have updated the doc to use your example, but it can be done easily with other PCRE2 facilities: (?|(ab)c|(a)) does the same thing. If "a" is complex, and you do not want to write it out twice, you could DEFINE it and use a subroutine call. I don't say t

Re: [pcre-dev] Clearing documentation about infinite loops

2019-06-20 Thread ND via Pcre-dev

On 2019-06-20 16:15, ph10 wrote: You can see all this by making use of the "auto-callout" feature Thanks a lot, Philip. I quite well understand what is really happened. My concern is about how this is documented. In the first example, the same thing happens, but after (?=b) ismatched, \z fa

Re: [pcre-dev] Document SKIP position before or equal start_offset

2019-06-20 Thread ND via Pcre-dev

On 2019-06-20 16:29, ph10 wrote: I have updated the documentation. Updated docs: If (*SKIP) is used inside a lookbehind to specify a new starting point that is not later than the starting point of the current match, it is ignored, and the normal "bumpalong" occurs. May be "it is ignor

Re: [pcre-dev] Some words about assertion docs

2019-06-21 Thread ND via Pcre-dev

On 2019-06-20 15:40, ph10 wrote: (?:(?=X)|(?=Y))Z means "if X matches, try to match Z; if that fails, if Y matches try to match Z". In the simple case the second match of Z will be the same as the first, so will always fail. However, if X and Y are complex and contain capturing parentheses, I s

Re: [pcre-dev] several messages

2019-06-22 Thread ND via Pcre-dev

Thanks a lot for clarifying docs and for your patience with me. On 2019-06-21 16:18, ph10 wrote: On Mon, 17 Jun 2019, ND via Pcre-dev wrote: Second of my little concern is that "X*\z" and "X*" both matches and matches are different. I understand why it is from proc

Re: [pcre-dev] several messages

2019-06-22 Thread ND via Pcre-dev

On 2019-06-22 08:51, ph10 wrote: There must be plenty of examples where removing \z changes what is matched. How about /[ab]*\z/ matched against "aaaxxxbbb"? I believed it was obviously that we told about matching from one position of subject. Sorry that I don't say it explicitly. In your ex

Re: [pcre-dev] Some words about assertion docs

2019-06-22 Thread ND via Pcre-dev

On 2019-06-22 08:56, ph10 wrote: On Fri, 21 Jun 2019, ND via Pcre-dev wrote: Imagine that we have a text. There are some words in this text that occurs at > least 10 times. We want to find from they a word that is most closer to the > end of text. >> If lookahead asse

Re: [pcre-dev] Document SKIP position before or equal start_offset

2019-06-22 Thread ND via Pcre-dev

Updated docs: If (*SKIP) is used inside a lookbehind to specify a new starting position... I suggest to remove "inside a lookbehind". A new starting position that is not later than the starting point of the current match may occur without lookbehind: PCRE2 version 10.33 2019-04-16 /(

Re: [pcre-dev] Max lookbehind calculation

2019-06-22 Thread ND via Pcre-dev

I attempt to second try with another example: PCRE2 version 10.33 2019-04-16 /(?<=(?<=a)b)c.*/info Capture group count = 0 Max lookbehind = 1 First code unit = 'c' Subject length lower bound = 1 abc\=ph Partial match: bc < Why max lookbehind=1, but not 2? -- ## List details at

Re: [pcre-dev] Some words about assertion docs

2019-06-22 Thread ND via Pcre-dev

On 2019-06-22 15:20, ph10 wrote: On Sat, 22 Jun 2019, ND via Pcre-dev wrote: Your example is not working right (let's change 10 to 3 for simplicity): >> /\A.*\b(\w++)(?>.*?\b\1\b){2}/ > word1 word1 word2 word2 word2 word1 > 0: word1 word1 word2 word2 word2 > 1: word

Re: [pcre-dev] Max lookbehind calculation

2019-06-22 Thread ND via Pcre-dev

On 2019-06-22 16:37, ph10 wrote: nesting lookbehinds in the way you have done is unusual. It may be less unusial if we use a simple assertions: PCRE2 version 10.33 2019-04-16 /(?<=\ba)b.*/info Capture group count = 0 Max lookbehind = 1 First code unit = 'b' Subject length lower bound = 1 xyz-

[pcre-dev] Start optimizations with partial match

2019-06-22 Thread ND via Pcre-dev

Good day! Here is pcre2test listing: /(?<=ab)cde/info Capture group count = 0 Max lookbehind = 2 First code unit = 'c' Last code unit = 'e' Subject length lower bound = 3 ab\=ph Partial match: ab << We can see that PCRE calculates first code unit, last code unit and subject

Re: [pcre-dev] Start optimizations with partial match

2019-06-22 Thread ND via Pcre-dev

Or this calculations occurs at compile time while partial matching flag is set at matchtime? -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Start optimizations with partial match

2019-06-22 Thread ND via Pcre-dev

On 2019-06-23 04:33, ND wrote: Or this calculations occurs at compile time while partial matching flag is set at matchtime? Oh! Now I read docs about it. It seems that PARTIAL are compiletime option only for JIT. So it seems that disabling of this calculations may matter to JIT only. May b

Re: [pcre-dev] JIT regression

2019-06-25 Thread ND via Pcre-dev

On 2019-06-25 09:30, Zoltán Herczeg wrote: > It seems JIT is 16 times!! slower than interpreter for such simple pattern. I did some improvements on the SSE2 accelerated search and /(?s).*/ search. You can try them now. However I have never seen such big differences in my measurements. The m

[pcre-dev] (*THEN) works differently in Perl

2019-06-30 Thread ND via Pcre-dev

Good day! Here is pcre2test listing: PCRE2 version 10.33 2019-04-16 /\A(?:.|..)(*THEN)c/ abc No match Perl is match "abc". I suppose "next innermost alternative" is interpreted differently by PCRE and Perl. If so, may be PCRE should go Perl way in this matter? Thanks. -- ## List deta

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-01 Thread ND via Pcre-dev

On 2019-07-01 10:28, ph10 wrote: I think this is a bug in Perl and I will report it as such. It's great. As you participate in Perl regex development can you take a look at another Perl bug please: PCRE2 version 10.33 2019-04-16 /\A(?:.(*COMMIT))*c/ abcd No match But Perl reports that

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-01 Thread ND via Pcre-dev

On 2019-07-01 10:28, ph10 wrote: On Sun, 30 Jun 2019, ND via Pcre-dev wrote: PCRE2 version 10.33 2019-04-16 > /\A(?:.|..)(*THEN)c/ > abc > No match >>> Perl is match "abc". > I suppose "next innermost alternative" is interpreted differently by PCRE

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-02 Thread ND via Pcre-dev

On 2019-07-02 14:34, ph10 wrote: A Perl developer has admitted there is some ambiguity, but suggests that (*COMMIT) just means "never advance the starting point". That patterncan find a match without advancing the starting point. I have pointedout that, in that case, /.*(*COMMIT)c/ should al

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-07 Thread ND via Pcre-dev

On 2019-07-03 17:33, ph10 wrote: On Tue, 2 Jul 2019, ND via Pcre-dev wrote: It seems a Perl is so buggy or have really different conception of (*COMMIT) > then PCRE. I am waiting for further information from the Perl developers, but I suspect that I won't want to change PCRE2, except

Re: [pcre-dev] Some words about assertion docs

2019-07-07 Thread ND via Pcre-dev

On 2019-06-22 16:03, ph10 wrote: On Sat, 22 Jun 2019, ND via Pcre-dev wrote: Sorry for my bad English. > I need to find word that is closest to the end of text and occurs at least 10 > times in that text. Yes, I understand that now. I will think about it. Non-atomic lookarounds

[pcre-dev] JIT don't detect endless subroutine recursion

2019-07-07 Thread ND via Pcre-dev

Good day! PCRE2 version 10.33 2019-04-16 /(?0)/ abc Failed: error -52: nested recursion at the same subject position As I can see interpreter recognize this endless recursion right away. But JIT don't. It recursed unless memory is run out: Failed: error -46: JIT stack limit reached May be

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-07 Thread ND via Pcre-dev

And if we disregards Perl's bugs then it seems (*COMMIT) in Perl works in a following manner: 1. Backtracking can't move to the left of COMMIT (this is PCRE behaviour too) 2. If COMMIT occurs then no advance match to any other position of subject can happen. No matter there are any other ba

Re: [pcre-dev] (*THEN) works differently in Perl

2019-07-10 Thread ND via Pcre-dev

On 2019-07-09 13:53, ph10 wrote: On Mon, 8 Jul 2019, ND via Pcre-dev wrote: And if we disregards Perl's bugs then it seems (*COMMIT) in Perl works in a > following manner: >> 1. Backtracking can't move to the left of COMMIT (this is PCRE behaviour too) > 2. If COMMIT

[pcre-dev] Partial match at end of subject

2019-07-10 Thread ND via Pcre-dev

Good day! Here is 2 pcre2test listings: PCRE2 version 10.33 2019-04-16 /(?<=(?=.(?<=x)))/ ab\=ph Partial match: b /(?<=.(?=x))/ ab\=ph Partial match: b < Isn't both results should be "no match" instead of "partial match"? Thanks. -- ## List details at https://lists.exim.or

Re: [pcre-dev] Partial match at end of subject

2019-07-11 Thread ND via Pcre-dev

On 2019-07-11 16:18, ph10 wrote: Why? "Partial match" means "if you add some more characters to the subject, it MAY match". If you add "x", it matches. I guess you told about second example (in first example "x" don't adds). I believed empty match at the end of string is not counted as par

Re: [pcre-dev] Partial match at end of subject

2019-07-12 Thread ND via Pcre-dev

On 2019-07-12 07:08, ph10 wrote: On Thu, 11 Jul 2019, ND via Pcre-dev wrote: I guess you told about second example (in first example "x" don't adds). I > believed empty match at the end of string is not counted as partial. This is a documentation issue. Instead of "empt

Re: [pcre-dev] Partial match at end of subject

2019-07-12 Thread ND via Pcre-dev

On 2019-07-12 15:17, ph10 wrote: On Fri, 12 Jul 2019, ND via Pcre-dev wrote: This is about my second example. > But it seems first example have another issue: >> >PCRE2 version 10.33 2019-04-16 > >/(?<=(?=.(?<=x)))/ > >ab\=ph > >Partial match: b >> Wh

Re: [pcre-dev] Partial match at end of subject

2019-07-12 Thread ND via Pcre-dev

On 2019-07-12 15:31, ND wrote: On 2019-07-12 15:17, ph10 wrote: > On Fri, 12 Jul 2019, ND via Pcre-dev wrote: >> This is about my second example. > > But it seems first example have another issue: > >> >PCRE2 version 10.33 2019-04-16 > > >/(?<=(?=.(?&l

[pcre-dev] Detecting starting code units

2019-07-13 Thread ND via Pcre-dev

Good day! PCRE try to detect starting code units in attempt to apply a start optimization. As we can see from next two examples, it detects starting code units for "[^ab]", but don't doing this for "[^a]". I think it looks a bit curiously. May be "[^a]" can use the same algorithm as "[^ab]"

Re: [pcre-dev] Partial match at end of subject

2019-07-13 Thread ND via Pcre-dev

On 2019-07-13 11:44, ph10 wrote: In this case PCRE2 finds a *complete* match before it finds a partial match. The pattern says "assert we are at the end of the subject"; that is true. Then it says "end of pattern" - so it returns a complete match. It never gets the chance to consider a partial ma

Re: [pcre-dev] Some words about assertion docs

2019-07-13 Thread ND via Pcre-dev

On 2019-07-13 11:22, ph10 wrote: I have done this work, and committed the patches. The new code supports both (*napla: and (*naplb: It's great! Thanks a lot! I was meat a (*napla necessity some time ago when try to construct a pattern for this task: I review a text of research article with

Re: [pcre-dev] Partial match at end of subject

2019-07-13 Thread ND via Pcre-dev

On 2019-07-13 16:47, ph10 wrote: On Sat, 13 Jul 2019, ND via Pcre-dev wrote: PCRE2_PARTIAL_HARD is intended for multisegment matching. I think when this > option is set it means: this subject IS incomplete, it's only a non-last part > of a certain "entire" subject. It

Re: [pcre-dev] Some words about assertion docs

2019-07-13 Thread ND via Pcre-dev

On 2019-07-13 16:50, ph10 wrote: On Sat, 13 Jul 2019, ND via Pcre-dev wrote: Unfortunately PCRE2 svn version is not compiled for me with Microsoft Visual > Studio 2019 on Windows 7x64. Can you compile the released source versions? (There shouldn't be any difference, but I just

Re: [pcre-dev] Some words about assertion docs

2019-07-13 Thread ND via Pcre-dev

On 2019-07-13 19:21, ND wrote: On 2019-07-13 16:50, ph10 wrote: > On Sat, 13 Jul 2019, ND via Pcre-dev wrote: >> Unfortunately PCRE2 svn version is not compiled for me with Microsoft > Visual > > Studio 2019 on Windows 7x64. >Can you compile the released source versions

Re: [pcre-dev] Partial match at end of subject

2019-07-15 Thread ND via Pcre-dev

On 2019-07-15 15:24, ph10 wrote: My point about partial matching meaning "may be incomplete" is still true. Partial matching was not invented originally for multi-segement matching, but for dynamically checking input. For example, if a user is typing an 8-digit number, as each character is rec

Re: [pcre-dev] Subject length lower bound calculation

2019-07-15 Thread ND via Pcre-dev

On 2019-06-05 15:54, ph10 wrote: On Wed, 5 Jun 2019, ND via Pcre-dev wrote: May be there is a some space for optimization there. >> PCRE analyze subpattern in lookaround and say: > First code unit = 'a' > Last code unit = 'c' >> So it already knows that &

Re: [pcre-dev] Some words about assertion docs

2019-07-16 Thread ND via Pcre-dev

On 2019-07-14 11:54, ph10 wrote: I am sorry that I cannot help, but I don't even use Windows, let alone MSVC. All the information I put in NON-AUTOTOOLS-BUILD was sent to me by other people. Thanks. Now I can successfully compile PCRE svn versions. It achieved not directly by MSVisualStudio

Re: [pcre-dev] Detecting starting code units

2019-07-17 Thread ND via Pcre-dev

On 2019-07-17 09:00, ph10 wrote: On Sat, 13 Jul 2019, I wrote: > May be "[^a]" can use the same algorithm as "[^ab]"? >> [^a] is optimized into a different (faster) opcode; I will see if this > can easily produce the same starting code units as [^ab] for tidyness. I > do not expect it will d

Re: [pcre-dev] Partial match at end of subject

2019-07-17 Thread ND via Pcre-dev

On 2019-07-17 16:55, ph10 wrote: On Mon, 15 Jul 2019, ND via Pcre-dev wrote: This option is added ten years ago EXACTLY for multisegment matching. > Please read a very first proposal post and thread about it. Thats how > partial_hard is born: > https://lists.exim.org/lurke

Re: [pcre-dev] Partial match at end of subject

2019-07-18 Thread ND via Pcre-dev

On 2019-07-18 16:48, ph10 wrote: On Wed, 17 Jul 2019, ND via Pcre-dev wrote: Let us ignore for the moment whether there should be a new option or not, and try to figure out what new logic might be needed. I am going to experiment with the suggestion I made earlier: If a hard partial match is

Re: [pcre-dev] Partial match at end of subject

2019-07-21 Thread ND via Pcre-dev

New algorithm still have another parts of discussed oversight. For example it returns full match instead of partial in following cases: /(?![ab]).*/ ab\=ph 0: /c*+/ ab\=ph,offset=2 0: Alternative suggestion don't have this troubles. It simplify calculations that main application must

Re: [pcre-dev] Partial match at end of subject

2019-07-22 Thread ND via Pcre-dev

On 2019-07-22 16:32, ph10 wrote: The characteristic of these is that the pattern can match an empty string. I have now added this condition (which was easily done with no repeated test) and those patterns now give partial matches. It's excellent!! Now it can be useful to try putting into wo

Re: [pcre-dev] Partial match at end of subject

2019-07-23 Thread ND via Pcre-dev

On 2019-07-22 17:32, ND wrote: Now it can be useful to try putting into words, what exactly in applying to multisegment matching means "local no match" and what means "partial match". Doc's says: A partial match occurs during a call to pcre2_match() when the end of the subject string i

1 2 >

1 - 100 of 122 matches

Mail list logo