On Sun, 3 Feb 2013, ND wrote: > Good day, Philip! > > Suppose input string 'abcd' must be matched against pattern '\Ac|e'. > It's obviously that result must be 'no match'. > > But input string arrives not at once but by two chunks: > 1. ab > 2. cd > Application attempts to match 'ab' with 'partial hard' option. No match > detected. > Then application attempts to match 'cd'. And it matched. > And whole input string is erroneously matched. > > I don't know a way for application to correctly match multisegment strings > against such patterns. May be some extra functionality can be added to PCRE. > For example, a possibility to tell to PCRE that first symbol of input string > is not a first symbol of whole string.
Isn't that what PCRE_NOTBOL does? Ah, looking at the code, I see that that works for ^ but not for \A, and this behaviour is documented. (I cannot remember why it is this way, but I am not keen on changing it now.) One way of working round this is always to retain one character from the first chunk so you match against "bcd", with startoffset=1. > I propose following solution: PCRE can treat '\A' as lookbehind with > length=1. So PCRE_INFO_MAXLOOKBEHIND adjusts accordingly. Ah, yes, that would automatically do what I have just suggested. Noted. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
