Re: [pcre-dev] Partial match at end of subject

ND via Pcre-dev Sat, 13 Jul 2019 11:53:08 -0700

On 2019-07-13 16:47, ph10 wrote:

On Sat, 13 Jul 2019, ND via Pcre-dev wrote:
PCRE2_PARTIAL_HARD is intended for multisegment matching. I think when
this
> option is set it means: this subject IS incomplete, it's only anon-last part
> of a certain "entire" subject.
It was never intended to mean "this subject is incomplete", rather "thissubject MAY BE incomplete". However ...

I don't agree. It can't be "MAY BE". Multisegment matching based upon factthat before executing pcre2_match() we exactly know is subject complete ornot. Without this knowledge there is no way to treat arrived segment ofsubject in appropriate manner. When we know that subject IS incomplete wecall pcre2_match() with PCRE2_PARTIAL_HARD option. When last segmentarrived we call pcre2_match() without PCRE2_PARTIAL_HARD.At its core \z is positive lookahead assertion that want to inspect nextcharacter of subject.

It is something like (?!\C).

Ups! When I write this I saw that (?!\C) and other patterns also returnscomplete match:


PCRE2 version 10.33 2019-04-16
/(?!\C)/
ab\=ph
 0:

PCRE2 version 10.33 2019-04-16
/c*/
ab\=ph
 0:

I think it's obvious that this examples (especially brightly the last)should return "no match". Why PCRE decided that segment "ab" is last(complete)? It is a user work to tell about it via settingPCRE2_PARTIAL_HARD option.

. Are we at the end of the subject?   If no, backtrack
. Is partial matching allowed?        If no, continue matching
. Have we inspected any characters?   If no, continue matching
. Is it partial hard?                 If yes, return a partial match

I see this algorithm is for both PARTIAL_SOFT and PARTIAL_OPTIONS. Sorry Idon't thinking a lot about PARTIAL_SOFT and no have practice with it. Ipropose following algorithm (for PARTIAL_HARD only disregarding theexistence of PARTIAL_SOFT):


. Are we at the end of the subject?   If no, backtrack
. Is partial hard matching allowed?   If no, continue matching

. Have we inspected any characters? If yes, return a partial matchElse return "no match"

And same algorithm should be involved in other cases such as examplesabove.

The only way this could be changed would be to make some kind of
exception for \z and I do not think that is a good idea.

I think no exception needed. It should be processed in a common way. Wesee that it's not only \z issue.

--

## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Partial match at end of subject

Reply via email to