On Dec 14, Deven T. Corzine said:

>The crux of the problem is that non-greedy qualifiers don't affect the
>"earliest match" behavior, which makes the matches more greedy than they
>really ought to be.

That's because "greediness" is just a measure of crawl vs. backtrack.  The
regex /a.*b/ will match 'a', and as many non-\n characters as possible,
and then look for a 'b'.  Upon failing, it will back up one character.  On
the other hand, /a.*?b/ matches an 'a', and then 0 characters, and then
tries to match a 'b', and upon failing matches another character, etc.

>     $_ = "aaaabbbbccccddddeeee";
>     ($greedy) = /(b.*d)/;              # "bbbbccccdddd" (correct)
>     ($non_greedy) = /(b.*?d)/;         # "bbbbccccd" (should be "bccccd"!)
>
>Does anyone disagree with the premise, and believe that "bbbbccccd" is the
>CORRECT match for the non-greedy regexp above?

>     match as many times as possible (given a particular starting
>     location) while still allowing the rest of the pattern to match.

The starting location is the first 'b' it matches.  Greediness has nothing
to do with the 'b' in your regex -- it has to do with the '.'.  The engine
matches a 'b', and then starts working on 0 or more of anything.

You're asking for something like

  /(?<!b)(b.*?d)/

which is an "optimization" you'll have to incorporate on your own.

-- 
Jeff "japhy" Pinyan     [EMAIL PROTECTED]    http://www.pobox.com/~japhy/
CPAN - #1 Perl Resource  (my id:  PINYAN)       http://search.cpan.org/
PerlMonks - An Online Perl Community          http://www.perlmonks.com/
The Perl Archive - Articles, Forums, etc.   http://www.perlarchive.com/

Reply via email to