Re: AW: Re: Regex help needed...
> Regex has been around a long time > and lots of smart computer science types has > spent time coming up with ways to optimize its performance for pattern > matching. That's was true, it's still true and will always be true! and here are some benchmarks done in a late rainy sunday evening: * Regex2 faster than Chunk by: 2.1 times* For the details: 1) Regex1 is the original regex, Chunk1 is from Richard, Regex2 is mine. 2) You can noticed the difference in time depending on the value of pPage ( that's a normal behavior with regex) 3) I've done the calculation the same way as Richard did, so you can compare ** aPage = 1, Same? true true Regex1: 8943 ms Chunk1: 210 ms Regex2: 99 ms Regex2 faster than orig regex by: 90.3 times Regex2 faster than Chunk by: 2.1 times ** aPage = 2, Same? true true Regex1: 9946 ms Chunk1: 212 ms Regex2: 100 ms Regex2 faster than orig regex by: 99.5 times Regex2 faster than Chunk by: 2.1 times ** aPage = 3, Same? true true Regex1: 4451 ms Chunk1: 210 ms Regex2: 98 ms Regex2 faster than orig regex by: 45.4 times Regex2 faster than Chunk by: 2.1 times ** aPage = 4, Same? true true Regex1: 11465 ms Chunk1: 200 ms Regex2: 98 ms Regex2 faster than orig regex by: 117 times Regex2 faster than Chunk by: 2 times ** aPage = 5, Same? true true Regex1: 11457 ms Chunk1: 201 ms Regex2: 94 ms Regex2 faster than orig regex by: 121.9 times Regex2 faster than Chunk by: 2.1 times Kind regards, Thierry Thierry Douez - http://sunny-tdz.com sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: AW: Re: Regex help needed...
Hi Mark, There's huge differences in how regex implementations perform in different languages. For example: http://raid6.com.au/~onlyjob/posts/arena/ Perl outperforms everything in that test. I've never assumed that LC's "perl compatiable regex library" is going to perform at the speed which actual Perl performs. I've always assumed that being "perl compatible" just meant that all syntactically-correct Perl regexs should run with LC's implementation, without needing any kind of change in how the regex is formatted. There was an academic paper I came across 15 years ago, which showed that Tcl out-performed Perl. Now it seems Perl outperforms Tcl, suggesting that one or the other has made changes to their underlying engine which impact regex performance. Or that the tests I read about 15 years ago were just testing regex features which resulted in Tcl out-performing Perl, and vice-versa in the above test. It would be great if LC's implementation was as fast as Perl's. Here's a page comparing several implementations of PCRE (with some non-pcre regex implementations): http://sljit.sourceforge.net/regex_perf.html One of the things we had in LC5 which was phenomenally fast, was searching through the styledText of a field. That fast way of searching particular text structures got lost in the migration to LC8. Regards Bernard ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: AW: Re: Regex help needed...
On Wed, Feb 3, 2016 at 11:53 AM Bernard Devlinwrote: > One of the things we had in LC5 which was phenomenally fast, was searching > through the styledText of a field. That fast way of searching particular > text structures got lost in the migration to LC8. Could you expand on this a little? I'm not sure exactly what you mean by 'searching through the styledText of a field'. Thanks, Ali ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: AW: Re: Regex help needed...
On 03/02/2016 11:53, Bernard Devlin wrote: Perl outperforms everything in that test. I've never assumed that LC's "perl compatiable regex library" is going to perform at the speed which actual Perl performs. I've always assumed that being "perl compatible" just meant that all syntactically-correct Perl regexs should run with LC's implementation, without needing any kind of change in how the regex is formatted. To be precise, LiveCode uses the PCRE library, which is generally considered to be the *definitive* implementation of Perl Compatible Regular Expressions: http://www.pcre.org/. There isn't a special LiveCode-specific implementation of regular expressions involved. Peter -- Dr Peter BrettLiveCode Open Source Team LiveCode on reddit: https://reddit.com/r/livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: AW: Re: Regex help needed...
Hi Thierry, I might have missed it but did you publish your Regex2 to the list? Pete On Wed, Feb 3, 2016 at 12:07 PM Richard Gaskinwrote: > Thierry Douez write: > > >> Regex has been around a long time > >> and lots of smart computer science types has > >> spent time coming up with ways to optimize its performance for pattern > >> matching. > > > > That's was true, it's still true and will always be true! > > It's true that there are almost always ways to improve performance using > any method, but there are times when one method may be faster than > another so it's worth testing out, as you did here: > > > and here are some benchmarks > > done in a late rainy sunday evening: > > > > > > * Regex2 faster than Chunk by: 2.1 times* > > Great results - what was the regex you used for that? > > > > For the details: > > > > 1) Regex1 is the original regex, Chunk1 is from Richard, Regex2 is mine. > > 2) You can noticed the difference in time depending on the value of pPage > > ( that's a normal behavior with regex) > > 3) I've done the calculation the same way as Richard did, so you can > compare > > > > > > > > ** aPage = 1, Same? true true > > Regex1: 8943 ms > > Chunk1: 210 ms > > Regex2: 99 ms > > Regex2 faster than orig regex by: 90.3 times > > Regex2 faster than Chunk by: 2.1 times > > > > ** aPage = 2, Same? true true > > Regex1: 9946 ms > > Chunk1: 212 ms > > Regex2: 100 ms > > Regex2 faster than orig regex by: 99.5 times > > Regex2 faster than Chunk by: 2.1 times > > > > ** aPage = 3, Same? true true > > Regex1: 4451 ms > > Chunk1: 210 ms > > Regex2: 98 ms > > Regex2 faster than orig regex by: 45.4 times > > Regex2 faster than Chunk by: 2.1 times > > > > ** aPage = 4, Same? true true > > Regex1: 11465 ms > > Chunk1: 200 ms > > Regex2: 98 ms > > Regex2 faster than orig regex by: 117 times > > Regex2 faster than Chunk by: 2 times > > > > ** aPage = 5, Same? true true > > Regex1: 11457 ms > > Chunk1: 201 ms > > Regex2: 94 ms > > Regex2 faster than orig regex by: 121.9 times > > Regex2 faster than Chunk by: 2.1 times > > > > -- > Richard Gaskin > Fourth World Systems > Software Design and Development for the Desktop, Mobile, and the Web > > ambassa...@fourthworld.comhttp://www.FourthWorld.com > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: AW: Re: Regex help needed...
There's huge differences in how regex implementations perform in different > languages. For example: http://raid6.com.au/~onlyjob/posts/arena/ > Last year, I did some experiments: I had a 100 lines of LiveCode with a bunch of really big Regex. It took 120 seconds on my Macbook to run. I tried diffferent ways to write/modify the regex, and it always keep running around 120 seconds. Then, using my sunnYperl external, I copy/paste all my regex ( not one modification ) and rewrite the LC part in Perl (a bit more work but not that much). I came down to 9 seconds. Kind regards, Thierry -- Thierry Douez - http://sunny-tdz.com sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: AW: Re: Regex help needed...
Thierry Douez write: Regex has been around a long time and lots of smart computer science types has spent time coming up with ways to optimize its performance for pattern matching. That's was true, it's still true and will always be true! It's true that there are almost always ways to improve performance using any method, but there are times when one method may be faster than another so it's worth testing out, as you did here: and here are some benchmarks done in a late rainy sunday evening: * Regex2 faster than Chunk by: 2.1 times* Great results - what was the regex you used for that? For the details: 1) Regex1 is the original regex, Chunk1 is from Richard, Regex2 is mine. 2) You can noticed the difference in time depending on the value of pPage ( that's a normal behavior with regex) 3) I've done the calculation the same way as Richard did, so you can compare ** aPage = 1, Same? true true Regex1: 8943 ms Chunk1: 210 ms Regex2: 99 ms Regex2 faster than orig regex by: 90.3 times Regex2 faster than Chunk by: 2.1 times ** aPage = 2, Same? true true Regex1: 9946 ms Chunk1: 212 ms Regex2: 100 ms Regex2 faster than orig regex by: 99.5 times Regex2 faster than Chunk by: 2.1 times ** aPage = 3, Same? true true Regex1: 4451 ms Chunk1: 210 ms Regex2: 98 ms Regex2 faster than orig regex by: 45.4 times Regex2 faster than Chunk by: 2.1 times ** aPage = 4, Same? true true Regex1: 11465 ms Chunk1: 200 ms Regex2: 98 ms Regex2 faster than orig regex by: 117 times Regex2 faster than Chunk by: 2 times ** aPage = 5, Same? true true Regex1: 11457 ms Chunk1: 201 ms Regex2: 94 ms Regex2 faster than orig regex by: 121.9 times Regex2 faster than Chunk by: 2.1 times -- Richard Gaskin Fourth World Systems Software Design and Development for the Desktop, Mobile, and the Web ambassa...@fourthworld.comhttp://www.FourthWorld.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: AW: Re: Regex help needed...
Regex is wonderfully compact to write relative to equivalent routines using chunk expressions, but sometimes paid for in execution time. When I come across a good regex example like the one you provided, if I have a moment I like to test things out to see where regex is faster and where it isn't. It's really great for many things, but carries quite a bit of overhead. Of course for this test to be relevant it assumes that most of the specifiers in the regex expression are merely to identify the elements you're looking for, and that the data is expected to fit the definition you provided. Given that, it's possible to make the regex a bit simpler (see foo2 below), but only with a modest boost to performance. It can probably be simplified more, but the chunk-based alternative performed so well I didn't bother exploring the regex side any further. Writing a lengthier handler that uses chunk expressions seems to yield the same results you reported, running between 12 and 60 times faster (depending on the percentage of lines tested that match the criteria being looked for). For one-offs like validating email addresses regex can be an excellent fit, and even some larger tasks depending on the specifics. But for iterating across lists I've often been delightfully surprised by LiveCode's gracefully efficient chunk handling. Testing your original data replicated to become 250 lines long, and looking for page 1 among them, the script below yields: Regex: 9261 ms RegexLite: 7958 ms Chunks: 197 ms Chunks faster than orig regex by: 47.01 times Chunks faster than lite regex by: 40.4 times Same result? true on mouseUp put fld 1 into tList put 1 into tPage --< change this for different tests put 1000 into n -- -- Test 1: original regex put the millisecs into t repeat n put foo1(tPage, tList) into r1 end repeat put the millisecs - t into t1 -- -- Test 2: lighter regex put the millisecs into t repeat n put foo2(tPage, tList) into r2 end repeat put the millisecs - t into t2 -- -- Test 3: chunks put the millisecs into t repeat n put foo3(tPage, tList) into r3 end repeat put the millisecs - t into t3 -- -- Display results: set the numberformat to "0.##" put "Regex: " &" ms" \ &"RegexLite: " &" ms" \ &"Chunks: "& t3 &" ms" \ &"Chunks faster than orig regex by: "&(t1 / t3)&" times" \ &"Chunks faster than lite regex by: "&(t2 / t3)&" times" \ &"Same result? "& (r1=r3) & r1 & r3 end mouseUp function foo1 pPage, tList put "(.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)" into tMatchPattern filter lines of tList with regex pattern tMatchPattern return tList end foo1 function foo2 pPage, tList put "(.+\t"&",*)|(.+\t\d+,\d+,"&",*)|(.+\t"&",*)" into tMatchPattern filter lines of tList with regex pattern tMatchPattern return tList end foo2 function foo3 pPage, tList repeat for each line tLine in tList set the itemdel to tab put item 3 of tLine into t1 put pPage &"," into tPageMarker if "." is in t1 then if (t1 begins with tPageMarker) then put tLine after tNuList end if else if ( t1 begins with tPageMarker) OR (item 4 of tLine begins with tPageMarker) then put tLine after tNuList end if end if end repeat delete last char of tNuList return tNuList end foo3 -- Richard Gaskin Fourth World Systems Software Design and Development for the Desktop, Mobile, and the Web ambassa...@fourthworld.comhttp://www.FourthWorld.com Paul Dupuis wrote: Never mind. Solved it. It was the pattern for the 2nd format. Fixed with "(.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)" On 1/30/2016 3:17 PM, Paul Dupuis wrote: I need some regex help. I have a list that is of the form: i.e. 1Testing1,7471,1,1,747 2Testing752,18001,752,1,1800 3Testing5398,58462,320,2,768 4Testing3,111.951,683.915,302.268,385.751 3,111.951,683.915,302.268,385.751 can have a list of number in 1 of 2 formats: A comma separated list of 4 integers, i.e. ,,, OR A comma separated list of 1 integer, followed by 4 decimal numbers, i.e. I need filter the lines of this list with a REGEX pattern to get lines WHERE a value pPage matches certain places in , specifically: where pPage is equal to either or in the first format(i.e. item 1 or item 3) OR where pPage is equal to in the second format(i.e. item 1) So my code is: put "((.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+))" into tMatchPattern filter lines of tList with regex pattern tMatchPattern If pPage is 1 then I should get: 1Testing1,7471,1,1,747 2Testing752,1800
Re: AW: Re: Regex help needed...
Wow. I would not have expected such a significant difference. Regex has been around a long time and lots of smart computer science types has spent time coming up with ways to optimize its performance for pattern matching. I assumed (falsely) that regex based filters in LC would be on par or even superior than a custom function using chunks. This leads me to: 1) wondering if LC's hooks to whatever regex tool they are using under the hood is a good as it should be AND 2) planning on rewriting my code to use chunks. Thanks for the post. On 1/30/2016 6:45 PM, Richard Gaskin wrote: > Regex is wonderfully compact to write relative to equivalent routines > using chunk expressions, but sometimes paid for in execution time. > > When I come across a good regex example like the one you provided, if > I have a moment I like to test things out to see where regex is faster > and where it isn't. It's really great for many things, but carries > quite a bit of overhead. > > Of course for this test to be relevant it assumes that most of the > specifiers in the regex expression are merely to identify the elements > you're looking for, and that the data is expected to fit the > definition you provided. > > Given that, it's possible to make the regex a bit simpler (see foo2 > below), but only with a modest boost to performance. It can probably > be simplified more, but the chunk-based alternative performed so well > I didn't bother exploring the regex side any further. > > Writing a lengthier handler that uses chunk expressions seems to yield > the same results you reported, running between 12 and 60 times faster > (depending on the percentage of lines tested that match the criteria > being looked for). > > For one-offs like validating email addresses regex can be an excellent > fit, and even some larger tasks depending on the specifics. > > But for iterating across lists I've often been delightfully surprised > by LiveCode's gracefully efficient chunk handling. > > Testing your original data replicated to become 250 lines long, and > looking for page 1 among them, the script below yields: > > Regex: 9261 ms > RegexLite: 7958 ms > Chunks: 197 ms > Chunks faster than orig regex by: 47.01 times > Chunks faster than lite regex by: 40.4 times > Same result? true > > > on mouseUp > put fld 1 into tList > put 1 into tPage --< change this for different tests > put 1000 into n > -- > -- Test 1: original regex > put the millisecs into t > repeat n > put foo1(tPage, tList) into r1 > end repeat > put the millisecs - t into t1 > -- > -- Test 2: lighter regex > put the millisecs into t > repeat n > put foo2(tPage, tList) into r2 > end repeat > put the millisecs - t into t2 > -- > -- Test 3: chunks > put the millisecs into t > repeat n > put foo3(tPage, tList) into r3 > end repeat > put the millisecs - t into t3 > -- > -- Display results: > set the numberformat to "0.##" > put "Regex: " &" ms" \ > &"RegexLite: " &" ms" \ > &"Chunks: "& t3 &" ms" \ > &"Chunks faster than orig regex by: "&(t1 / t3)&" times" \ > &"Chunks faster than lite regex by: "&(t2 / t3)&" times" \ > &"Same result? "& (r1=r3) & r1 & r3 > end mouseUp > > > function foo1 pPage, tList > put > "(.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)" > into tMatchPattern > filter lines of tList with regex pattern tMatchPattern > return tList > end foo1 > > > function foo2 pPage, tList > put "(.+\t"&",*)|(.+\t\d+,\d+,"&",*)|(.+\t"&",*)" > into tMatchPattern > filter lines of tList with regex pattern tMatchPattern > return tList > end foo2 > > > > function foo3 pPage, tList > repeat for each line tLine in tList > set the itemdel to tab > put item 3 of tLine into t1 > put pPage &"," into tPageMarker > if "." is in t1 then > if (t1 begins with tPageMarker) then > put tLine after tNuList > end if > else > if ( t1 begins with tPageMarker) OR (item 4 of tLine begins with > tPageMarker) then > put tLine after tNuList > end if > end if > end repeat > delete last char of tNuList > return tNuList > end foo3 > > > > > > > > > > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: AW: Re: Regex help needed...
On 01/30/2016 04:28 PM, Paul Dupuis wrote: 1) wondering if LC's hooks to whatever regex tool they are using under the hood is a good as it should be LC's regex library is the same PCRE library everyone else uses. And it's the latest released version. Regex's power lies in its ability to match complex patterns, which doesn't necessarily translate to speed. AND 2) planning on rewriting my code to use chunks. You may find that regex matching works better than LC's chunk matching in some situations. For speed though, it's hard to beat the built=in chunking functions in LC, as they're already pretty well optimized. -- Mark Wieder ahsoftw...@gmail.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: AW: Re: Regex help needed...
Paul Dupuis wrote: > Wow. I would not have expected such a significant difference. Regex > has been around a long time and lots of smart computer science types > has spent time coming up with ways to optimize its performance for > pattern matching. I assumed (falsely) that regex based filters in LC > would be on par or even superior than a custom function using chunks. > This leads me to: > > 1) wondering if LC's hooks to whatever regex tool they are using under > the hood is a good as it should be > AND > 2) planning on rewriting my code to use chunks. One of the reasons for my seemingly-obsessive benchmarking is to learn about what goes on under the hood, and to try to anticipate it when choosing among different algos. LC does such a good job of shielding us from what goes on under the hood that we often forget that the relationship between the number of lines we write and the number of machine instructions our scripts invoke may differ broadly depending on the statement. My favorite example is: set the scroll of field 1 to 100 -- seems simple enough, but having written scrollbar management routines in C back in the pre-Cocoa days I learned that a *tremendous* number of low-level routines come into play with that one simple line of script. LC makes it easy to take this stuff for granted, since it does all the heavy lifting. Same with regex. The beauty of regex is that it's a very generalized solution. The downside of regex is that it's a very generalized solution. ;) Generalized options can provide convenience, but often at the cost of performance. Purpose-built solutions are usually much faster than generalized ones, and with LC's chunk expressions they're fun to write too. :) There are times when regex will outperform chunk expressions, though, so I would caution against rewriting everything. Benchmarking is the key, and some day I will have done enough to be able to come up with a small set of useful rules as to when to use chunks and when to use regex. But at the moment, it's half hunch and half benchmarking to confirm the hunch. -- Richard Gaskin Fourth World Systems Software Design and Development for the Desktop, Mobile, and the Web ambassa...@fourthworld.comhttp://www.FourthWorld.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode