[pcre-dev] [Bug 1190] lookaround doesn't work with startoffset

Philip Hazel Wed, 28 Dec 2011 03:08:56 -0800

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1190

--- Comment #3 from Philip Hazel <[email protected]>  2011-12-28 11:08:37 
---
On Tue, 27 Dec 2011, Alan Lehotsky wrote:

> Trying to match the pattern (from page 66 of Mastering Regular Expressions, 
> 3rd
> Edition)
> 
>   (?<=\d)(?=(\d\d\d)+(?!\d)
> 
> with the source string "1234567"
> 
> This fails if my search loop uses the startoffset argument to pcre_exec() to
> advance thru the search string (leaving the subject and length unchanged as I
> find successive match points).
> 
> But it does work if I advance the subject ptr and decrement the length, and 
> use
> a zero for the startoffset on each call.

The pcretest program has facilities for trying both of these methods, 
and for me it gives the same result both times:

PCRE version 8.12 2011-01-15

/(?<=\d)(?=(\d\d\d)+(?!\d))/g+
    1234567
 0: 
 0+ 234567
 1: 567
 0: 
 0+ 567
 1: 567

/(?<=\d)(?=(\d\d\d)+(?!\d))/G+
    1234567
 0: 
 0+ 234567
 1: 567
 0: 
 0+ 567
 1: 567

The /g option does the startoffset thing, whereas the /G option advances 
the pointer after a match. The /+ option causes it to output the rest of
the string that follows a match - so you can see exactly where it
matches an empty string. Note also that Perl with /g also gives exactly
the same results.

If you are worried that the /G option is looking behind in order to give 
the first match (as I momentarily was), you are mistaken. That match 
happens when the string is passed as "1234567" - remember that when an 
unanchored pattern is matched there is an internal advance within the 
string. A match against "234567" finds only the second match:

/(?<=\d)(?=(\d\d\d)+(?!\d))/+
    234567
 0: 
 0+ 567
 1: 567

-- 
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] [Bug 1190] lookaround doesn't work with startoffset

Reply via email to