> On Sun, 24 May 2009, ND wrote:
>
>> What do you think about adding following PCRE behavior:
>>
>> The return code PCRE_ERROR_MULTISEGMENT raised, and matching abandons
>> immediately if at any time during the matching process PCRE needs to
>> check (not bumpalong) the next symbol of subject string, but discovers
>> an end of string. An extra parameter - last_bumpalong_offset - is
>> returned.
>>
>> IMHO, it will allow to organize true multisegment matching.
>
> No. Multisegment matching is impossible with pcre_exec() because it has
> to be able to backtrack to any part of the string. Consider
>
> To do multi-segment matching, you need a searching strategy that scans
> the data string just once. This is provided by pcre_dfa_exec(). However,
> that imposes restrictions, such as no support for capturing parentheses.
>
> Philip
>

No. I don't mean that multisegment matching can be provided directly and  
only by pcre_exec(). I wrote that with proposed pcre_exec() behavior main  
application may organize true multisegment matching.

> ^(a.*z|something else)
> If it reads "a", then lots of characters, but no "z", it then has to
> backtrack right to the start of the string so that it can look for
> "something else".

Let's suppose that we have pattern ^(a.*z|something else) and subject  
string divided by two segments: first - 'abcd' and second - 'efz0'.
Now PCRE scans 'a', then scans 'bcd', then want to check next symbol, but  
discovers an end of string and backtrack to <something else>. When  
PCRE_MULTISEGMENT option is on, PCRE dont backtrack and immediately return  
PCRE_ERROR_MULTISEGMENT and 1 (last bumpalong offset). Main application  
saves the part of first segment beginning from last-bumpalong-offset and  
waits the next one. When 'afz0' comes, main application concatenate it  
with previous saved part and send 'abcdefz0' to new pcre_exec() request.  
And 'abcdefz' is returned, that is right answer.

Main application can't organize true multisegment matching with  
PCRE_PARTIAL.

> That is the way Perl-style, depth-first, matching works.
There is no declension from this behavior in my proposition.

Michael
-- 
Написано в почтовом клиенте браузера Opera: http://www.opera.com/mail/
-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to