Hello Lars,

Wow! What a reply!

Thanks so much for the detail. I think it should be added to the faq
section to help others. I'm sure this kind of expression must be
regularly sought after.

In fact, it might spure me on into buying one of the recommended
RegExp books suggested in other parts of this thread.

But for now, your explanation helps me no end! Thanks again.

-- 
Best regards,
 Jon 

Wednesday, April 3, 2002, 1:03:42 PM, you wrote:

LG> Hi Jon,
LG> On Wednesday, April 3, 2002 at 11:45:39 [GMT +0100], you wrote:

JL>> That did the trick!

JL>> Can I ask...

JL>> [Questions snipped]

LG> OK, I'll try my first RegExp introduction. My knowledge about this topic
LG> is rather basic, but it'll be enough for that RegExp:

LG> Let's have a look at the RegExp:

LG> %SETPATTREGEXP="(?is)(^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).*"%-
LG> %REGEXPBlindMATCH="%TEXT"%-
LG> %Subpatt="2", %Subpatt="3", %Subpatt="4", %Subpatt="5"

LG> First of all, notice how it consists of three parts, which correspond to
LG> the three lines.

LG> The first part tells TB what pattern to look for (SETPATTernREGEXP).

LG> The second part tells TB to look for the previously specified pattern in
LG> the %TEXT.

LG> And the last part places the found subpatterns into your text file,
LG> separated by ", ". More on subpatterns will follow, when explaining the
LG> construction of the search pattern.

LG> The first two lines end with a newline (this may seem strange at first
LG> but there *is* a newline and it would be present in the output as well),
LG> which is not what we want, so we must suppress the newlines with %- at
LG> the end of the lines.

LG> Now, let's have a closer look at the RegExp pattern:

LG> (?is)(^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).*

LG> This pattern consists of two parts. The first part (between "(?" and
LG> ")") sets the options for the RegExp. "i" means case-insensitive
LG> searches, "s" makes the "." metacharacter match newlines, too (without
LG> it, the . wouldn't match newlines).

LG> This leads us to the next important thing, the meaning of the ".". This
LG> character inside a RegExp matches any character except newline by
LG> default.

LG> This is the pattern we defined:

LG> (^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).*

LG> First thing we notice are the number of parentheses. Parentheses in a
LG> RegExp group things together as subpatterns (remember that word?). These
LG> subpatterns are enumerated with their opening parenthesis, starting from
LG> 1.

LG> We have these subpatterns (use the PTV to see the ^ markings at the
LG> correct place!):

LG> (^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).*
LG> ^      |   |      |   |         |   |          |   |  ^   subpattern 1
LG>        ^   ^      |   |         |   |          |   |      subpattern 2
LG>                   ^   ^         |   |          |   |      subpattern 3
LG>                                 ^   ^          |   |      subpattern 4
LG>                                                ^   ^      subpattern 5

LG> The subpatterns 2, 3, 4 and 5 match .*?, which stands for "zero or more
LG> occurrences (*) of any character (.), but don't be greedy (?)", so it
LG> matches only as much as really needed (it matches everything up to the
LG> next newline).

LG> So, the RegExp looks for "Name:" at the beginning of a line (that's the
LG> meaning of "^") followed by subpattern 2, the actual name of the
LG> customer. The other subpatterns are the same.

LG> Now, with all this in mind, we should be able to tweak the RegExp a
LG> little bit. First, we don't need the (?s) option, "." never matches a
LG> newline in our RegExp. Second, we don't need subpattern 1, it is never
LG> used. And third, we add "\s+" (stands for "one or more (+) whitespace
LG> characters (\s)") between the field names and the subpatterns for the
LG> actual values. This way, we only get the needed values in the
LG> subpatterns, without leading tabs or spaces:

LG> 
%SETPATTREGEXP="(?i)^Name:\s+(.*?)\nCity:\s+(.*?)\nProduct:\s+(.*?)\nAmount\s+(.*?)\n"%-
LG> %REGEXPBLINDMATCH="%TEXT"%-
LG> %Subpatt="1", %Subpatt="2", %Subpatt="3", %Subpatt="4"

LG> This relies on your example given in the first post. You might have to
LG> add a ":" after the "Amount", depending on the format your messages have
LG> (I guess it was a typo that you left out the colon in your example after
LG> "Amount", you'd have to add it then if I'm right).

LG> I think one could even make the ".*" of the subpatterns greedy (leaving
LG> out the "?") because the "." now wouldn't match a newline, so the
LG> subpattern would stop at the end of a line, but I didn't test it.

LG> HTH


______________________________________________________
Archives   : http://tbtech.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]

Reply via email to