On 2019-07-13 16:47, ph10 wrote:
On Sat, 13 Jul 2019, ND via Pcre-dev wrote:
PCRE2_PARTIAL_HARD is intended for multisegment matching. I think when
this
> option is set it means: this subject IS incomplete, it's only a non-last part
> of a certain "entire" subject.
It was never intended to mean "this subject is incomplete", rather "this subject MAY BE incomplete". However ...


I don't agree. It can't be "MAY BE". Multisegment matching based upon fact that before executing pcre2_match() we exactly know is subject complete or not. Without this knowledge there is no way to treat arrived segment of subject in appropriate manner. When we know that subject IS incomplete we call pcre2_match() with PCRE2_PARTIAL_HARD option. When last segment arrived we call pcre2_match() without PCRE2_PARTIAL_HARD. At its core \z is positive lookahead assertion that want to inspect next character of subject.
It is something like (?!\C).
Ups! When I write this I saw that (?!\C) and other patterns also returns complete match:

PCRE2 version 10.33 2019-04-16
/(?!\C)/
ab\=ph
 0:

PCRE2 version 10.33 2019-04-16
/c*/
ab\=ph
 0:

I think it's obvious that this examples (especially brightly the last) should return "no match". Why PCRE decided that segment "ab" is last (complete)? It is a user work to tell about it via setting PCRE2_PARTIAL_HARD option.


. Are we at the end of the subject?   If no, backtrack
. Is partial matching allowed?        If no, continue matching
. Have we inspected any characters?   If no, continue matching
. Is it partial hard?                 If yes, return a partial match

I see this algorithm is for both PARTIAL_SOFT and PARTIAL_OPTIONS. Sorry I don't thinking a lot about PARTIAL_SOFT and no have practice with it. I propose following algorithm (for PARTIAL_HARD only disregarding the existence of PARTIAL_SOFT):

. Are we at the end of the subject?   If no, backtrack
. Is partial hard matching allowed?   If no, continue matching
. Have we inspected any characters? If yes, return a partial match Else return "no match"

And same algorithm should be involved in other cases such as examples above.


The only way this could be changed would be to make some kind of
exception for \z and I do not think that is a good idea.

I think no exception needed. It should be processed in a common way. We see that it's not only \z issue.

--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Reply via email to