On Jun 10, 2011, at 7:39 PM, J. Landman Gay wrote:
On 6/10/11 8:21 PM, Jim Ault wrote:
The () parens are telling the engine to capture any chars that meet the
conditions inside and assign them to the first variable specified. In
this case, it is 'retVal'
If there were a second set of (), then those chars would be assigned to
the second variable specified.
Good explantion, I like when regex gets explained. But what I don't get is how come the first set of parentheses aren't put into the variable:

 get matchText(tEthernetConfig,"(?s)inet (.*?) ",retVal)

The LC engine ignores the "(?s)". That's good and as it should be, but I'm not sure why.

LC honors the (?s), but as a directive, not a caputure.


When a paren is read and is followed by a ?
this signals an 'operation' rather than a 'capture'
--
Additional regex conditions or qualifiers are.....
Lookahead and Lookbehind ... scanning operations designated by

(?<= (?<! lookbehind positive and negative logic
positive and negative logic   lookahead  (?=     (?!
--
(?Usi)  means shortest match, allow multiple lines, disregard case
(?U) means shortest match, single line, case sensitive
(?s) means longest match, allow multiple lines, case sensitive
if it is missing then
default = means longest match, single line, case sensitive

What is meant by 'single line' is that a return char restarts the scanning on another line. Multi line means the return is seen as just another char in the text block so the repeat loops can keep going to find the longest match.
---

Think of the regex engine as a complex series of nested repeat loops that are a combination of
repeat while
repeat until
making many, many char by char scans, in both directions, from both ends of a block, to find the longest positive result, unless told to be ungreedy (shortest result)

The repeat loops are designed to accept strings and operators in series
such that a given block of text is scanned in both directions in order to implement logic patterns.

This multiple scanning can occur from the first char forward and the last char backward to find the best solution.

Simple rules don't show you all the multiple scans (repeat loops) that are used to arrive at the parsed result. Large blocks of text can take several minutes to scan depending on conditions and conditionals.


The paren as a directive works for
[a-zA-Z] means a to z lower and upper for a single char
or
(?i)[a-z]  means a to z lower and upper for a single char
--since the 'i' means case insensitive
(?i)([a-z]) -- will capture a single char if it is a-z either case.
If the test fails there is no value assigned.
LC will allow a test for empty, but Perl and others will report an error like 'undefined' since there was no match, no capture, and no assignment. Also, in Perl, etc, you must define variables ahead of the regex if you want to avoid 'undefined'.

Hope this makes a little light reading for the weekend.

Jim Ault
Las Vegas



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to