Hi Marck,
Historians believe that Tuesday, July 31, 2001 at 11:10 GMT +0100 was
when, Marck D. Pearlstone [MP] typed the following:
MP> \s*
MP> Match one (or more) white space characters.
Minor correction (probably a typo), that should be: Match zero (or
more) white space...
MP> (\S*?)
MP> The brackets here denote the first captured sub-pattern. It consists
MP> of zero or more (*) non-space (\S) characters. To be honest the '?'
MP> (zero or one) specifier here is a bit confusing... but it works.
The '?' operator has two meanings depending on location. If it
follows a character or a subpattern, it means zero or one times. If
it follows a repeat operator (eg *?) the '?' makes the other repeat
operator ungreedy. The best example is in the Help file, but here's
another. Suppose you want to extract the first e-mail address from
the To: line, you could use a regexp like:
^To\:.*(\S+@\S+)
But, the .* is greedy, it will take as many characters as it can.
This means, you'll actually get the last address. But if you replace
".*" with ".*?", now you'll get the first address. Try it out to see
what I mean.
Note that you might consider these to be trivial examples. But
knowing the greedy settings can be important. The original GMT
Time/Date macro that many people use has a classic example:
\s*?(.*)
If you think about it, the '\s*?' will never match anything since there
is a '.*' that will cover everything.
MP> (\s+(\S*?))*?
MP> Here's sub-pattern 2. It is one or more (+) white space characters
MP> (\s) followed by zero or more non-space characters (as above). Again,
MP> another '?' which I'm not sure about ... Januk? And then the whole
MP> pattern is repeated zero or more times (*) with another zero or one
MP> (?) specifier to cap it off.
Same thing. I noticed that without the ungreedy options, I was
getting much more matched than what was wanted. With this regexp,
the last name is stored in subpattern 3. I thought it was a useful
feature to put into the regexp.
--
Thanks for writing,
Januk Aggarwal
Using The Bat! 1.54 Beta/4 under Windows 98 4.10 Build 2222 A
This sentence lets you know this page was not a mistake.
--
______________________________________________________
Archives : <http://tbtech.thebat.dutaint.com>
Moderators : <mailto:[EMAIL PROTECTED]>
Unsubscribe: <mailto:[EMAIL PROTECTED]>