Remember that pcre_exec() returns whichever of a full match or a hard 
partial match it finds first.

On Sun, 22 Jan 2012, ND wrote:

> PCRE version 8.21 2011-12-12
> /(?<=a)(?!b)/+
> \P\Pa
> Partial match: a

The lookbehind works; in the lookahead a partial match is forced because
the next character is not available *and* at least one character has
been inspected.

> Now we swap assertions:
> 
> PCRE version 8.21 2011-12-12
> /(?!b)(?<=a)/+
> \P\Pa
> 0:
> 0+

In this case a partial match is not forced in the lookahead because no
characters have been inspected. So the lookahead succeeds and the rest 
of the pattern matches.

> Another example:
> 
> PCRE version 8.21 2011-12-12
> /(?!a)/+
> \P\Pa
> 0:
> 0+

Same thing. A partial match can never be an empty string.

On Sun, 22 Jan 2012, Zoltán Herczeg wrote:

> But you want a complete behaviour change for your specific use case
> which would break compatibility, although they have some reasons.

Indeed.

> As this would be a compatibility breaker new feature, we should
> probably aim for 8.31 The best thing would be to open a bug and
> discuss this new behaviour. And let other people tell their opinion
> which usually takes time...
> 
> If I summarize what you said so far. If hard partial matching  is enabled:
>   - \z (and perhaps \Z and $) must never match at the end of the string
>   - Match must not allowed at the end of the subject string

If a new feature is added (or the behaviour of hard partial is changed), 
the only possibility would be to return no match.

A thought: we already have the PCRE_NOTEOL flag, but it is documented 
like this:

  PCRE_NOTEOL                                                            

  This option specifies that the end of the subject string is not the
  end of a line, so the dollar metacharacter should not match it nor
  (except in multiline mode) a newline immediately before it. Setting
  this without PCRE_MULTILINE (at compile time) causes dollar never to
  match. This option affects only the behaviour of the dollar
  metacharacter. It does not affect \Z or \z.

I wonder why it does not affect \Z or \z? I further wonder if it should
be made to affect \Z and \z when hard partial matching is happening?

Philip

-- 
Philip Hazel
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to