Re: [pcre-dev] 'Hard' partial matching don't work with some assertions

ND Sun, 29 Aug 2010 14:25:04 -0700

> As for "correcting": I have been writing software for over 40 years, and
> one think I learned very early on was that making an incompatible change
> always causes a problem for *somebody*, however much you think "nobody
> will notice this change". That is why I try very hard not to make
> incompatible changes, and introduce new options instead. That is why I
> added "hard" rather than change the way the previous partial worked.


> I can imagine
> that somebody who is using partial matching would want to be sure of
> finding a longer partial match rather than a shorter complete match. For
> example, the pattern abc(def?) applied to the string "abc".
>
Your example demonstraits that the "abc" is the first segment and user  
suppose that second may arrive. This is case of multisegment string. IMHO  
there are no other implementations of partial 'hard' option. And if we  
consider this viewpoint than PCRE behaviour with lookaheads '\z', '\Z',  
'$', '\b' is a imperfection, and correction (no adding new functionality)  
needed. And from this point of view there are no "incompatible changes" -  
there are bug correction. IMHO a bug is not in programm realization stage  
but in conception formulating stage.

You consider that there are other implementations that equal to  
multisegment string matching at all but want a little difference: that  
lookaheads must works without really trying to lookaheading to next  
possible string segment. Are such implementations may exists?

There are my arguments. But it will be your selection.

I offer that in 'hard' partial mode (or in some new mode if you though  
select to create it):
1. applying '\z', '\Z', '\b', '$' at the end position of subject string  
must (in respect of 2.) produce partial match
2. if and only if (offset of the earliest character that was inspected  
when the partial match candidate was found) less than  
(end-of-subject-string offset), than partial match can be an empty string

PS Adding 'hard' option in 2009 was great thing. Thanx. I applyed PCRE to  
analyze data flow. Data is transferred by chunks, and my apllication don't  
have beforehand knowing when it ends. But application doing realtime  
analyzis of arrived parts and doing actions accordingly. So important  
practical implementation of PCRE was born with 'hard' option appearance -  
possibility to analyze multisegment strings and endless data flows. There  
is wide spectrum of such data, and first of all - internet and net  
transmissions. But recently I discover a bugs in my application flow  
analyzis. Cause some lookahead assertions are not really lookahead and  
don't try to view ahead. So now my application can't "be sure of finding a  
longer partial match rather than a shorter complete match" (your words).


> That is slightly odd. I would expect them BOTH to return MATCH, with the
> first returning "t" and the second "" (which it does). I have made a
> note to investigate this when I next work on PCRE (not soon).
>
In purposes of multisegment string matching they both must return  
'ERROR_PARTIAL' as described.

Thanx.

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] 'Hard' partial matching don't work with some assertions

Reply via email to