On Thu, 08 Feb 2007 13:50:51 +0800, Bin Chen <[EMAIL PROTECTED]>
wrote:

> In the help page:
>         If a "-" appears immediately after the "{", then a shortest match
>         first algorithm is used (see example below).  In particular, 
> "\{-}" is
>         the same as "*" but uses the shortest match first algorithm.  BUT: A
>         match that starts earlier is preferred over a shorter match: 
> "a\{-}b"
>         matches "aaab" in "xaaab".
> 
> What's the meaning of the BUT clause? If the \{-} function as above, the 
> a\{-}b should match "ab" not "aaab".

Regular expression matchers work by starting at the beginning of the
string to be matched and seeing if it is possible to match the pattern
against the string at that point. If no match is possible then the
first character of the string is discarded and a match against the
pattern is attempted starting at the second character, then the third,
and so on.

But this continues only as long as no match is found. As soon as a
match is found the search ends and that match is returned, which means
that the match is always as far to the left as possible. If you are
interested in any other matches then you have to do the programming
yourself: for example, if you want the shortest match no matter where
it appears in the string then you'd have to save all possible matches
and look for the shortest one afterwards.

-- 
Matthew Winn

Reply via email to