Hello Lars, Wow! What a reply!
Thanks so much for the detail. I think it should be added to the faq section to help others. I'm sure this kind of expression must be regularly sought after. In fact, it might spure me on into buying one of the recommended RegExp books suggested in other parts of this thread. But for now, your explanation helps me no end! Thanks again. -- Best regards, Jon Wednesday, April 3, 2002, 1:03:42 PM, you wrote: LG> Hi Jon, LG> On Wednesday, April 3, 2002 at 11:45:39 [GMT +0100], you wrote: JL>> That did the trick! JL>> Can I ask... JL>> [Questions snipped] LG> OK, I'll try my first RegExp introduction. My knowledge about this topic LG> is rather basic, but it'll be enough for that RegExp: LG> Let's have a look at the RegExp: LG> %SETPATTREGEXP="(?is)(^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).*"%- LG> %REGEXPBlindMATCH="%TEXT"%- LG> %Subpatt="2", %Subpatt="3", %Subpatt="4", %Subpatt="5" LG> First of all, notice how it consists of three parts, which correspond to LG> the three lines. LG> The first part tells TB what pattern to look for (SETPATTernREGEXP). LG> The second part tells TB to look for the previously specified pattern in LG> the %TEXT. LG> And the last part places the found subpatterns into your text file, LG> separated by ", ". More on subpatterns will follow, when explaining the LG> construction of the search pattern. LG> The first two lines end with a newline (this may seem strange at first LG> but there *is* a newline and it would be present in the output as well), LG> which is not what we want, so we must suppress the newlines with %- at LG> the end of the lines. LG> Now, let's have a closer look at the RegExp pattern: LG> (?is)(^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).* LG> This pattern consists of two parts. The first part (between "(?" and LG> ")") sets the options for the RegExp. "i" means case-insensitive LG> searches, "s" makes the "." metacharacter match newlines, too (without LG> it, the . wouldn't match newlines). LG> This leads us to the next important thing, the meaning of the ".". This LG> character inside a RegExp matches any character except newline by LG> default. LG> This is the pattern we defined: LG> (^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).* LG> First thing we notice are the number of parentheses. Parentheses in a LG> RegExp group things together as subpatterns (remember that word?). These LG> subpatterns are enumerated with their opening parenthesis, starting from LG> 1. LG> We have these subpatterns (use the PTV to see the ^ markings at the LG> correct place!): LG> (^Name:(.*?)\nCity(.*?)\nProduct(.*?)\nAmount:?(.*?)\n).* LG> ^ | | | | | | | | ^ subpattern 1 LG> ^ ^ | | | | | | subpattern 2 LG> ^ ^ | | | | subpattern 3 LG> ^ ^ | | subpattern 4 LG> ^ ^ subpattern 5 LG> The subpatterns 2, 3, 4 and 5 match .*?, which stands for "zero or more LG> occurrences (*) of any character (.), but don't be greedy (?)", so it LG> matches only as much as really needed (it matches everything up to the LG> next newline). LG> So, the RegExp looks for "Name:" at the beginning of a line (that's the LG> meaning of "^") followed by subpattern 2, the actual name of the LG> customer. The other subpatterns are the same. LG> Now, with all this in mind, we should be able to tweak the RegExp a LG> little bit. First, we don't need the (?s) option, "." never matches a LG> newline in our RegExp. Second, we don't need subpattern 1, it is never LG> used. And third, we add "\s+" (stands for "one or more (+) whitespace LG> characters (\s)") between the field names and the subpatterns for the LG> actual values. This way, we only get the needed values in the LG> subpatterns, without leading tabs or spaces: LG> %SETPATTREGEXP="(?i)^Name:\s+(.*?)\nCity:\s+(.*?)\nProduct:\s+(.*?)\nAmount\s+(.*?)\n"%- LG> %REGEXPBLINDMATCH="%TEXT"%- LG> %Subpatt="1", %Subpatt="2", %Subpatt="3", %Subpatt="4" LG> This relies on your example given in the first post. You might have to LG> add a ":" after the "Amount", depending on the format your messages have LG> (I guess it was a typo that you left out the colon in your example after LG> "Amount", you'd have to add it then if I'm right). LG> I think one could even make the ".*" of the subpatterns greedy (leaving LG> out the "?") because the "." now wouldn't match a newline, so the LG> subpattern would stop at the end of a line, but I didn't test it. LG> HTH ______________________________________________________ Archives : http://tbtech.thebat.dutaint.com Moderators : mailto:[EMAIL PROTECTED] Unsubscribe: mailto:[EMAIL PROTECTED]
