On Sun, 3 Feb 2013, ND wrote:

> Good day, Philip!
> 
> Suppose input string 'abcd' must be matched against pattern '\Ac|e'.
> It's obviously that result must be 'no match'.
> 
> But input string arrives not at once but by two chunks:
> 1. ab
> 2. cd
> Application attempts to match 'ab' with 'partial hard' option. No match
> detected.
> Then application attempts to match 'cd'. And it matched.
> And whole input string is erroneously matched.
> 
> I don't know a way for application to correctly match multisegment strings
> against such patterns. May be some extra functionality can be added to PCRE.
> For example, a possibility to tell to PCRE that first symbol of input string
> is not a first symbol of whole string.

Isn't that what PCRE_NOTBOL does? Ah, looking at the code, I see that 
that works for ^ but not for \A, and this behaviour is documented. (I 
cannot remember why it is this way, but I am not keen on changing it 
now.) 

One way of working round this is always to retain one character from the 
first chunk so you match against "bcd", with startoffset=1.

> I propose following solution: PCRE can treat '\A' as lookbehind with     
> length=1. So PCRE_INFO_MAXLOOKBEHIND adjusts accordingly.                

Ah, yes, that would automatically do what I have just suggested. Noted.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to