Hi Jon,
On Wednesday, April 3, 2002 at 11:45:39 [GMT +0100], you wrote:

JL> That did the trick!

JL> Can I ask...

JL> [Questions snipped]

OK, I'll try my first RegExp introduction. My knowledge about this topic
is rather basic, but it'll be enough for that RegExp:

Let's have a look at the RegExp:

%SETPATTREGEXP="(?is)(^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).*"%-
%REGEXPBlindMATCH="%TEXT"%-
%Subpatt="2", %Subpatt="3", %Subpatt="4", %Subpatt="5"

First of all, notice how it consists of three parts, which correspond to
the three lines.

The first part tells TB what pattern to look for (SETPATTernREGEXP).

The second part tells TB to look for the previously specified pattern in
the %TEXT.

And the last part places the found subpatterns into your text file,
separated by ", ". More on subpatterns will follow, when explaining the
construction of the search pattern.

The first two lines end with a newline (this may seem strange at first
but there *is* a newline and it would be present in the output as well),
which is not what we want, so we must suppress the newlines with %- at
the end of the lines.

Now, let's have a closer look at the RegExp pattern:

(?is)(^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).*

This pattern consists of two parts. The first part (between "(?" and
")") sets the options for the RegExp. "i" means case-insensitive
searches, "s" makes the "." metacharacter match newlines, too (without
it, the . wouldn't match newlines).

This leads us to the next important thing, the meaning of the ".". This
character inside a RegExp matches any character except newline by
default.

This is the pattern we defined:

(^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).*

First thing we notice are the number of parentheses. Parentheses in a
RegExp group things together as subpatterns (remember that word?). These
subpatterns are enumerated with their opening parenthesis, starting from
1.

We have these subpatterns (use the PTV to see the ^ markings at the
correct place!):

(^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).*
^      |   |      |   |         |   |          |   |  ^   subpattern 1
       ^   ^      |   |         |   |          |   |      subpattern 2
                  ^   ^         |   |          |   |      subpattern 3
                                ^   ^          |   |      subpattern 4
                                               ^   ^      subpattern 5

The subpatterns 2, 3, 4 and 5 match .*?, which stands for "zero or more
occurrences (*) of any character (.), but don't be greedy (?)", so it
matches only as much as really needed (it matches everything up to the
next newline).

So, the RegExp looks for "Name:" at the beginning of a line (that's the
meaning of "^") followed by subpattern 2, the actual name of the
customer. The other subpatterns are the same.

Now, with all this in mind, we should be able to tweak the RegExp a
little bit. First, we don't need the (?s) option, "." never matches a
newline in our RegExp. Second, we don't need subpattern 1, it is never
used. And third, we add "\s+" (stands for "one or more (+) whitespace
characters (\s)") between the field names and the subpatterns for the
actual values. This way, we only get the needed values in the
subpatterns, without leading tabs or spaces:

%SETPATTREGEXP="(?i)^Name:\s+(.*?)\nCity:\s+(.*?)\nProduct:\s+(.*?)\nAmount\s+(.*?)\n"%-
%REGEXPBLINDMATCH="%TEXT"%-
%Subpatt="1", %Subpatt="2", %Subpatt="3", %Subpatt="4"

This relies on your example given in the first post. You might have to
add a ":" after the "Amount", depending on the format your messages have
(I guess it was a typo that you left out the colon in your example after
"Amount", you'd have to add it then if I'm right).

I think one could even make the ".*" of the subpatterns greedy (leaving
out the "?") because the "." now wouldn't match a newline, so the
subpattern would stop at the end of a line, but I didn't test it.

HTH

-- 
Regards,
Lars

The Bat! 1.60c on Windows XP 5.1 Build 2600 
 ____________________________________________________________
|        Lars Geiger  |  <mailto:[EMAIL PROTECTED]>        |


______________________________________________________
Archives   : http://tbtech.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]

Reply via email to