> As for "correcting": I have been writing software for over 40 years, and > one think I learned very early on was that making an incompatible change > always causes a problem for *somebody*, however much you think "nobody > will notice this change". That is why I try very hard not to make > incompatible changes, and introduce new options instead. That is why I > added "hard" rather than change the way the previous partial worked.
> I can imagine > that somebody who is using partial matching would want to be sure of > finding a longer partial match rather than a shorter complete match. For > example, the pattern abc(def?) applied to the string "abc". > Your example demonstraits that the "abc" is the first segment and user suppose that second may arrive. This is case of multisegment string. IMHO there are no other implementations of partial 'hard' option. And if we consider this viewpoint than PCRE behaviour with lookaheads '\z', '\Z', '$', '\b' is a imperfection, and correction (no adding new functionality) needed. And from this point of view there are no "incompatible changes" - there are bug correction. IMHO a bug is not in programm realization stage but in conception formulating stage. You consider that there are other implementations that equal to multisegment string matching at all but want a little difference: that lookaheads must works without really trying to lookaheading to next possible string segment. Are such implementations may exists? There are my arguments. But it will be your selection. I offer that in 'hard' partial mode (or in some new mode if you though select to create it): 1. applying '\z', '\Z', '\b', '$' at the end position of subject string must (in respect of 2.) produce partial match 2. if and only if (offset of the earliest character that was inspected when the partial match candidate was found) less than (end-of-subject-string offset), than partial match can be an empty string PS Adding 'hard' option in 2009 was great thing. Thanx. I applyed PCRE to analyze data flow. Data is transferred by chunks, and my apllication don't have beforehand knowing when it ends. But application doing realtime analyzis of arrived parts and doing actions accordingly. So important practical implementation of PCRE was born with 'hard' option appearance - possibility to analyze multisegment strings and endless data flows. There is wide spectrum of such data, and first of all - internet and net transmissions. But recently I discover a bugs in my application flow analyzis. Cause some lookahead assertions are not really lookahead and don't try to view ahead. So now my application can't "be sure of finding a longer partial match rather than a shorter complete match" (your words). > That is slightly odd. I would expect them BOTH to return MATCH, with the > first returning "t" and the second "" (which it does). I have made a > note to investigate this when I next work on PCRE (not soon). > In purposes of multisegment string matching they both must return 'ERROR_PARTIAL' as described. Thanx. -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
