> On Sun, 24 May 2009, ND wrote: > >> What do you think about adding following PCRE behavior: >> >> The return code PCRE_ERROR_MULTISEGMENT raised, and matching abandons >> immediately if at any time during the matching process PCRE needs to >> check (not bumpalong) the next symbol of subject string, but discovers >> an end of string. An extra parameter - last_bumpalong_offset - is >> returned. >> >> IMHO, it will allow to organize true multisegment matching. > > No. Multisegment matching is impossible with pcre_exec() because it has > to be able to backtrack to any part of the string. Consider > > To do multi-segment matching, you need a searching strategy that scans > the data string just once. This is provided by pcre_dfa_exec(). However, > that imposes restrictions, such as no support for capturing parentheses. > > Philip >
No. I don't mean that multisegment matching can be provided directly and only by pcre_exec(). I wrote that with proposed pcre_exec() behavior main application may organize true multisegment matching. > ^(a.*z|something else) > If it reads "a", then lots of characters, but no "z", it then has to > backtrack right to the start of the string so that it can look for > "something else". Let's suppose that we have pattern ^(a.*z|something else) and subject string divided by two segments: first - 'abcd' and second - 'efz0'. Now PCRE scans 'a', then scans 'bcd', then want to check next symbol, but discovers an end of string and backtrack to <something else>. When PCRE_MULTISEGMENT option is on, PCRE dont backtrack and immediately return PCRE_ERROR_MULTISEGMENT and 1 (last bumpalong offset). Main application saves the part of first segment beginning from last-bumpalong-offset and waits the next one. When 'afz0' comes, main application concatenate it with previous saved part and send 'abcdefz0' to new pcre_exec() request. And 'abcdefz' is returned, that is right answer. Main application can't organize true multisegment matching with PCRE_PARTIAL. > That is the way Perl-style, depth-first, matching works. There is no declension from this behavior in my proposition. Michael -- Написано в почтовом клиенте браузера Opera: http://www.opera.com/mail/ -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
