Hi Marck,

Historians believe that Tuesday, July 31, 2001 at 11:10 GMT +0100 was
when, Marck D. Pearlstone [MP] typed the following:

MP>   \s*

MP> Match one (or more) white space characters.

Minor correction (probably a typo), that should be: Match zero (or
more) white space...

MP>   (\S*?)

MP> The brackets here denote the first captured sub-pattern. It consists
MP> of zero or more (*) non-space (\S) characters. To be honest the '?'
MP> (zero or one) specifier here is a bit confusing... but it works.

The '?' operator has two meanings depending on location.  If it
follows a character or a subpattern, it means zero or one times.  If
it follows a repeat operator (eg *?) the '?' makes the other repeat
operator ungreedy.  The best example is in the Help file, but here's
another.  Suppose you want to extract the first e-mail address from
the To: line, you could use a regexp like:
^To\:.*(\S+@\S+)

But, the .* is greedy, it will take as many characters as it can.
This means, you'll actually get the last address.  But if you replace
".*" with ".*?", now you'll get the first address.  Try it out to see
what I mean.

Note that you might consider these to be trivial examples.  But
knowing the greedy settings can be important.  The original GMT
Time/Date macro that many people use has a classic example:
\s*?(.*)

If you think about it, the '\s*?' will never match anything since there
is a '.*' that will cover everything.

MP>   (\s+(\S*?))*?

MP> Here's sub-pattern 2. It is one or more (+) white space characters
MP> (\s) followed by zero or more non-space characters (as above). Again,
MP> another '?' which I'm not sure about ... Januk? And then the whole
MP> pattern is repeated zero or more times (*) with another zero or one
MP> (?) specifier to cap it off.

Same thing.  I noticed that without the ungreedy options, I was
getting much more matched than what was wanted.  With this regexp,
the last name is stored in subpattern 3.  I thought it was a useful
feature to put into the regexp.

-- 
Thanks for writing,
 Januk Aggarwal

Using The Bat! 1.54 Beta/4 under Windows 98 4.10 Build 2222  A

This sentence lets you know this page was not a mistake.


-- 
______________________________________________________
Archives   : <http://tbtech.thebat.dutaint.com>
Moderators : <mailto:[EMAIL PROTECTED]>
Unsubscribe: <mailto:[EMAIL PROTECTED]>

Reply via email to