On 2019-07-15 15:24, ph10 wrote:

My point about partial matching meaning "may be incomplete" is still true. Partial matching was not invented originally for multi-segement matching, but for dynamically checking input. For example, if a user is typing an 8-digit number, as each character is received it can be added to the input and then the input can be matched to ^\d{8}\z with apartial option. Later, other features were added to help multi-segmentmatching. Also, "partial hard" means "return a partial match if it is found before a complete match". There is no requirement for any particular ordering. Therefore, for a pattern such as /\z/ a complete match may be foundand returned without any partial consideration.


Sorry I don't think about \z and EOL. But \z and EOL are not relate to issue that I want to discuss.


Let's differentiate partial_soft and partial_hard. I admit you are right about partial_soft meaning "may be incomplete".
But now we tell only about partial_hard.

This option is added ten years ago EXACTLY for multisegment matching.
Please read a very first proposal post and thread about it. Thats how partial_hard is born:
https://lists.exim.org/lurker/message/20090524.142622.cb850f3a.en.html

First I suggest to name this option PCRE_ERROR_MULTISEGMENT that is strongly about it real purpose. You name is PCRE_PARTIAL_HARD that was excellent, but this not change it's exclusive aim - multisegment matching.

This option was born not alone but right away with last_bumpalong returning result, that is further transforms to max_lookbehind. And this last_bumpalong purpose was also exclusive for multisegment matching only. Please read this thread is about first (afterborn) steps of PCRE_PARTIAL_HARD development:
https://lists.exim.org/lurker/message/20090901.142330.58bea511.en.html

So the algorithm of multisegment matching is:
1. If segment is not last process it with PCRE_PARTIAL_HARD option and hold max_lookbehind characters if continuing with next segment needed.
2. If segment is last then process it without PCRE_PARTIAL_HARD.

With multisegment matching we want that matching result be exactly the same as if we match a whole subject at once!

Consider example:

/c*+(?<=[bc])/
abc
 0: c


Now imagine that subject "abc" arrives by two segments: "ab" and "c".
We try to match whole subject using two calls to pcre2_match(). First is:

/c*+(?<=[bc])/
ab\=ph
 0:

Stop. We hope to have only result "c". But now we see there is yet another successful match - empty string at position 2. It's obvious that this match is wrong as we treat it as **subject "abc" is successfully match "c*+(?<=[bc])" at position 2 with empty string**.
And so all multisegment matching became wrong.


I'll be very happy if you try to reconsider your approach to PCRE_PARTIAL_HARD and totally associate this option with multisegment matching purposes. Because it's what it originally intended for. But tell please if you know about another practical use of this option that force you to change it's original aims.


Thanks a lot.

--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Reply via email to