Re: [pcre-dev] Partial match at end of subject

ND via Pcre-dev Mon, 15 Jul 2019 10:59:08 -0700

On 2019-07-15 15:24, ph10 wrote:

My point about partial matching meaning "may be incomplete" is stilltrue. Partial matching was not invented originally for multi-segementmatching, but for dynamically checking input. For example, if a user istyping an 8-digit number, as each character is received it can be addedto the input and then the input can be matched to ^\d{8}\z with apartialoption. Later, other features were added to help multi-segmentmatching.Also, "partial hard" means "return a partial match if it is found beforea complete match". There is no requirement for any particular ordering.Therefore, for a pattern such as /\z/ a complete match may be foundandreturned without any partial consideration.

Sorry I don't think about \z and EOL. But \z and EOL are not relate toissue that I want to discuss.

Let's differentiate partial_soft and partial_hard. I admit you are rightabout partial_soft meaning "may be incomplete".

But now we tell only about partial_hard.

This option is added ten years ago EXACTLY for multisegment matching.

Please read a very first proposal post and thread about it. Thats howpartial_hard is born:

https://lists.exim.org/lurker/message/20090524.142622.cb850f3a.en.html

First I suggest to name this option PCRE_ERROR_MULTISEGMENT that isstrongly about it real purpose. You name is PCRE_PARTIAL_HARD that wasexcellent, but this not change it's exclusive aim - multisegment matching.

This option was born not alone but right away with last_bumpalongreturning result, that is further transforms to max_lookbehind. And thislast_bumpalong purpose was also exclusive for multisegment matching only.Please read this thread is about first (afterborn) steps ofPCRE_PARTIAL_HARD development:

https://lists.exim.org/lurker/message/20090901.142330.58bea511.en.html

So the algorithm of multisegment matching is:

1. If segment is not last process it with PCRE_PARTIAL_HARD option andhold max_lookbehind characters if continuing with next segment needed.

2. If segment is last then process it without PCRE_PARTIAL_HARD.

With multisegment matching we want that matching result be exactly thesame as if we match a whole subject at once!


Consider example:

/c*+(?<=[bc])/
abc
 0: c


Now imagine that subject "abc" arrives by two segments: "ab" and "c".
We try to match whole subject using two calls to pcre2_match(). First is:

/c*+(?<=[bc])/
ab\=ph
 0:

Stop. We hope to have only result "c". But now we see there is yet anothersuccessful match - empty string at position 2. It's obvious that thismatch is wrong as we treat it as **subject "abc" is successfully match"c*+(?<=[bc])" at position 2 with empty string**.

And so all multisegment matching became wrong.

I'll be very happy if you try to reconsider your approach toPCRE_PARTIAL_HARD and totally associate this option with multisegmentmatching purposes. Because it's what it originally intended for.But tell please if you know about another practical use of this optionthat force you to change it's original aims.



Thanks a lot.

--

## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Partial match at end of subject

Reply via email to