When I come across a good regex example like the one you provided, if I have a moment I like to test things out to see where regex is faster and where it isn't. It's really great for many things, but carries quite a bit of overhead.
Of course for this test to be relevant it assumes that most of the specifiers in the regex expression are merely to identify the elements you're looking for, and that the data is expected to fit the definition you provided.
Given that, it's possible to make the regex a bit simpler (see foo2 below), but only with a modest boost to performance. It can probably be simplified more, but the chunk-based alternative performed so well I didn't bother exploring the regex side any further.
Writing a lengthier handler that uses chunk expressions seems to yield the same results you reported, running between 12 and 60 times faster (depending on the percentage of lines tested that match the criteria being looked for).
For one-offs like validating email addresses regex can be an excellent fit, and even some larger tasks depending on the specifics.
But for iterating across lists I've often been delightfully surprised by LiveCode's gracefully efficient chunk handling.
Testing your original data replicated to become 250 lines long, and looking for page 1 among them, the script below yields:
Regex: 9261 ms
RegexLite: 7958 ms
Chunks: 197 ms
Chunks faster than orig regex by: 47.01 times
Chunks faster than lite regex by: 40.4 times
Same result? true
on mouseUp
put fld 1 into tList
put 1 into tPage --< change this for different tests
put 1000 into n
--
-- Test 1: original regex
put the millisecs into t
repeat n
put foo1(tPage, tList) into r1
end repeat
put the millisecs - t into t1
--
-- Test 2: lighter regex
put the millisecs into t
repeat n
put foo2(tPage, tList) into r2
end repeat
put the millisecs - t into t2
--
-- Test 3: chunks
put the millisecs into t
repeat n
put foo3(tPage, tList) into r3
end repeat
put the millisecs - t into t3
--
-- Display results:
set the numberformat to "0.##"
put "Regex: "&t1 &" ms"&cr \
&"RegexLite: "&t2 &" ms"&cr \
&"Chunks: "& t3 &" ms"&cr \
&"Chunks faster than orig regex by: "&(t1 / t3)&" times" &cr \
&"Chunks faster than lite regex by: "&(t2 / t3)&" times" &cr \
&"Same result? "& (r1=r3) &cr&cr& r1 &cr&cr& r3
end mouseUp
function foo1 pPage, tList
put
"(.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)"
into tMatchPattern
filter lines of tList with regex pattern tMatchPattern return tList end foo1 function foo2 pPage, tListput "(.+\t"&pPage&",*)|(.+\t\d+,\d+,"&pPage&",*)|(.+\t"&pPage&",*)" into tMatchPattern
filter lines of tList with regex pattern tMatchPattern
return tList
end foo2
function foo3 pPage, tList
repeat for each line tLine in tList
set the itemdel to tab
put item 3 of tLine into t1
put pPage &"," into tPageMarker
if "." is in t1 then
if (t1 begins with tPageMarker) then
put tLine &cr after tNuList
end if
else
if ( t1 begins with tPageMarker) OR (item 4 of tLine begins with
tPageMarker) then
put tLine &cr after tNuList
end if
end if
end repeat
delete last char of tNuList
return tNuList
end foo3
--
Richard Gaskin
Fourth World Systems
Software Design and Development for the Desktop, Mobile, and the Web
____________________________________________________________________
[email protected] http://www.FourthWorld.com
Paul Dupuis wrote:
Never mind. Solved it. It was the pattern for the 2nd format. Fixed with "(.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)" On 1/30/2016 3:17 PM, Paul Dupuis wrote:I need some regex help. I have a list that is of the form: <number><tab><text><tab><numberCol1><tab><numberCol2> i.e. 1 Testing 1,747 1,1,1,747 2 Testing 752,1800 1,752,1,1800 3 Testing 5398,5846 2,320,2,768 4 Testing 3,111.951,683.915,302.268,385.751 3,111.951,683.915,302.268,385.751 <numberCol2> can have a list of number in 1 of 2 formats: A comma separated list of 4 integers, i.e. <integer1>,<integer2>,<integer3>,<integer4> OR A comma separated list of 1 integer, followed by 4 decimal numbers, i.e. <integer>,<decimal>,<decimal>,<decimal>,<decimal> I need filter the lines of this list with a REGEX pattern to get lines WHERE a value pPage matches certain places in <numberCol2>, specifically: where pPage is equal to either <integer1> or <integer3> in the first format(i.e. item 1 or item 3) OR where pPage is equal to <integer> in the second format(i.e. item 1) So my code is: put "((.+\t"&pPage&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&pPage&",\d+)|(.+\t"&pPage&",?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+))" into tMatchPattern filter lines of tList with regex pattern tMatchPattern If pPage is 1 then I should get: 1 Testing 1,747 1,1,1,747 2 Testing 752,1800 1,752,1,1800 and I do. If pPage is 2 then I should get: 3 Testing 5398,5846 2,320,2,768 and I do. If pPage is 3 then I should get: 4 Testing 3,111.951,683.915,302.268,385.751 3,111.951,683.915,302.268,385.751 and I do. if pPage is 4 then I should get and empty list, and I do, but when pPage is 5, I am expecting an empty list and I get 3 Testing 5398,5846 2,320,2,768 So something is wrong with my Regex, but I can not figure out what? It looks like it is matching against <numberCol1> in the last case (pPage=5) but it should not since there are only 2 items in the list rather than 4 or 5. I am using LiveCode 6.7.6
_______________________________________________ use-livecode mailing list [email protected] Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
