Re: AW: Re: Regex help needed...

2016-02-03 Thread Thierry Douez
​​
>  Regex has  been around a long time
>  and lots of smart computer science types has
> spent time coming up with ways to optimize its performance for pattern
> matching.

That's was true, it's still true and will always be true!


and here are some benchmarks
done in a late rainy sunday evening:


* Regex2 faster than Chunk by: 2.1 times*


For the details:

1) Regex1 is the original regex, Chunk1 is  from Richard, Regex2 is mine.
2) You can noticed the difference in time depending on the value of pPage
( that's a normal behavior with regex)
3) I've done the calculation the same way as Richard did, so you can compare



**  aPage = 1, Same? true true
Regex1: 8943 ms
Chunk1: 210 ms
Regex2: 99 ms
Regex2 faster than orig regex by: 90.3 times
Regex2 faster than Chunk by: 2.1 times

**  aPage = 2, Same? true true
Regex1: 9946 ms
Chunk1: 212 ms
Regex2: 100 ms
Regex2 faster than orig regex by: 99.5 times
Regex2 faster than Chunk by: 2.1 times

**  aPage = 3, Same? true true
Regex1: 4451 ms
Chunk1: 210 ms
Regex2: 98 ms
Regex2 faster than orig regex by: 45.4 times
Regex2 faster than Chunk by: 2.1 times

**  aPage = 4, Same? true true
Regex1: 11465 ms
Chunk1: 200 ms
Regex2: 98 ms
Regex2 faster than orig regex by: 117 times
Regex2 faster than Chunk by: 2 times

**  aPage = 5, Same? true true
Regex1: 11457 ms
Chunk1: 201 ms
Regex2: 94 ms
Regex2 faster than orig regex by: 121.9 times
Regex2 faster than Chunk by: 2.1 times




Kind regards,

Thierry




Thierry Douez - http://sunny-tdz.com
sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: AW: Re: Regex help needed...

2016-02-03 Thread Bernard Devlin
Hi Mark,

There's huge differences in how regex implementations perform in different
languages. For example: http://raid6.com.au/~onlyjob/posts/arena/

Perl outperforms everything in that test. I've never assumed that LC's
"perl compatiable regex library" is going to perform at the speed which
actual Perl performs. I've always assumed that being "perl compatible" just
meant that all syntactically-correct Perl regexs should run with LC's
implementation, without needing any kind of change in how the regex is
formatted.

There was an academic paper I came across 15 years ago, which showed that
Tcl out-performed Perl. Now it seems Perl outperforms Tcl, suggesting that
one or the other has made changes to their underlying engine which impact
regex performance. Or that the tests I read about 15 years ago were just
testing regex features which resulted in Tcl out-performing Perl, and
vice-versa in the above test.

It would be great if LC's implementation was as fast as Perl's.

Here's a page comparing several implementations of PCRE (with some non-pcre
regex implementations): http://sljit.sourceforge.net/regex_perf.html

One of the things we had in LC5 which was phenomenally fast, was searching
through the styledText of a field. That fast way of searching particular
text structures got lost in the migration to LC8.

Regards
Bernard
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: AW: Re: Regex help needed...

2016-02-03 Thread Ali Lloyd
On Wed, Feb 3, 2016 at 11:53 AM Bernard Devlin  wrote:
> One of the things we had in LC5 which was phenomenally fast, was searching
> through the styledText of a field. That fast way of searching particular
> text structures got lost in the migration to LC8.

Could you expand on this a little? I'm not sure exactly what you mean by
'searching through the styledText of a field'.

Thanks,
Ali
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: AW: Re: Regex help needed...

2016-02-03 Thread Peter TB Brett

On 03/02/2016 11:53, Bernard Devlin wrote:

Perl outperforms everything in that test. I've never assumed that LC's
"perl compatiable regex library" is going to perform at the speed which
actual Perl performs. I've always assumed that being "perl compatible" just
meant that all syntactically-correct Perl regexs should run with LC's
implementation, without needing any kind of change in how the regex is
formatted.


To be precise, LiveCode uses the PCRE library, which is generally 
considered to be the *definitive* implementation of Perl Compatible 
Regular Expressions: http://www.pcre.org/.  There isn't a special 
LiveCode-specific implementation of regular expressions involved.


  Peter

--
Dr Peter Brett 
LiveCode Open Source Team

LiveCode on reddit: https://reddit.com/r/livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: AW: Re: Regex help needed...

2016-02-03 Thread Peter Haworth
Hi Thierry,
I might have missed it but did you publish your Regex2 to the list?
Pete

On Wed, Feb 3, 2016 at 12:07 PM Richard Gaskin 
wrote:

> Thierry Douez write:
>
> >>  Regex has  been around a long time
> >>  and lots of smart computer science types has
> >> spent time coming up with ways to optimize its performance for pattern
> >> matching.
> >
> > That's was true, it's still true and will always be true!
>
> It's true that there are almost always ways to improve performance using
> any method, but there are times when one method may be faster than
> another so it's worth testing out, as you did here:
>
> > and here are some benchmarks
> > done in a late rainy sunday evening:
> >
> >
> > * Regex2 faster than Chunk by: 2.1 times*
>
> Great results - what was the regex you used for that?
>
>
> > For the details:
> >
> > 1) Regex1 is the original regex, Chunk1 is  from Richard, Regex2 is mine.
> > 2) You can noticed the difference in time depending on the value of pPage
> > ( that's a normal behavior with regex)
> > 3) I've done the calculation the same way as Richard did, so you can
> compare
> >
> >
> >
> > **  aPage = 1, Same? true true
> > Regex1: 8943 ms
> > Chunk1: 210 ms
> > Regex2: 99 ms
> > Regex2 faster than orig regex by: 90.3 times
> > Regex2 faster than Chunk by: 2.1 times
> >
> > **  aPage = 2, Same? true true
> > Regex1: 9946 ms
> > Chunk1: 212 ms
> > Regex2: 100 ms
> > Regex2 faster than orig regex by: 99.5 times
> > Regex2 faster than Chunk by: 2.1 times
> >
> > **  aPage = 3, Same? true true
> > Regex1: 4451 ms
> > Chunk1: 210 ms
> > Regex2: 98 ms
> > Regex2 faster than orig regex by: 45.4 times
> > Regex2 faster than Chunk by: 2.1 times
> >
> > **  aPage = 4, Same? true true
> > Regex1: 11465 ms
> > Chunk1: 200 ms
> > Regex2: 98 ms
> > Regex2 faster than orig regex by: 117 times
> > Regex2 faster than Chunk by: 2 times
> >
> > **  aPage = 5, Same? true true
> > Regex1: 11457 ms
> > Chunk1: 201 ms
> > Regex2: 94 ms
> > Regex2 faster than orig regex by: 121.9 times
> > Regex2 faster than Chunk by: 2.1 times
> >
>
> --
>   Richard Gaskin
>   Fourth World Systems
>   Software Design and Development for the Desktop, Mobile, and the Web
>   
>   ambassa...@fourthworld.comhttp://www.FourthWorld.com
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: AW: Re: Regex help needed...

2016-02-03 Thread Thierry Douez
There's huge differences in how regex implementations perform in different
> languages. For example: http://raid6.com.au/~onlyjob/posts/arena/
>


​Last year, I did some experiments:

I had a 100 lines of LiveCode with a bunch of really big Regex.
It took 120 seconds on my Macbook to run.

I tried diffferent ways to write/modify the regex, and it always keep
running
around 120 seconds.


Then, using my sunnYperl external,
I copy/paste all my regex ( not one modification )
and rewrite the LC part in Perl (a bit more work but not that much).

I came down to 9 seconds.

​
Kind regards,

Thierry​



-- 

Thierry Douez - http://sunny-tdz.com
sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: AW: Re: Regex help needed...

2016-02-03 Thread Richard Gaskin

Thierry Douez write:


 Regex has  been around a long time
 and lots of smart computer science types has
spent time coming up with ways to optimize its performance for pattern
matching.


That's was true, it's still true and will always be true!


It's true that there are almost always ways to improve performance using 
any method, but there are times when one method may be faster than 
another so it's worth testing out, as you did here:



and here are some benchmarks
done in a late rainy sunday evening:


* Regex2 faster than Chunk by: 2.1 times*


Great results - what was the regex you used for that?



For the details:

1) Regex1 is the original regex, Chunk1 is  from Richard, Regex2 is mine.
2) You can noticed the difference in time depending on the value of pPage
( that's a normal behavior with regex)
3) I've done the calculation the same way as Richard did, so you can compare



**  aPage = 1, Same? true true
Regex1: 8943 ms
Chunk1: 210 ms
Regex2: 99 ms
Regex2 faster than orig regex by: 90.3 times
Regex2 faster than Chunk by: 2.1 times

**  aPage = 2, Same? true true
Regex1: 9946 ms
Chunk1: 212 ms
Regex2: 100 ms
Regex2 faster than orig regex by: 99.5 times
Regex2 faster than Chunk by: 2.1 times

**  aPage = 3, Same? true true
Regex1: 4451 ms
Chunk1: 210 ms
Regex2: 98 ms
Regex2 faster than orig regex by: 45.4 times
Regex2 faster than Chunk by: 2.1 times

**  aPage = 4, Same? true true
Regex1: 11465 ms
Chunk1: 200 ms
Regex2: 98 ms
Regex2 faster than orig regex by: 117 times
Regex2 faster than Chunk by: 2 times

**  aPage = 5, Same? true true
Regex1: 11457 ms
Chunk1: 201 ms
Regex2: 94 ms
Regex2 faster than orig regex by: 121.9 times
Regex2 faster than Chunk by: 2.1 times



--
 Richard Gaskin
 Fourth World Systems
 Software Design and Development for the Desktop, Mobile, and the Web
 
 ambassa...@fourthworld.comhttp://www.FourthWorld.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Regex help needed...

2016-01-30 Thread Paul Dupuis
I need some regex help.

I have a list that is of the form:

i.e.
1Testing1,7471,1,1,747
2Testing752,18001,752,1,1800
3Testing5398,58462,320,2,768
4Testing3,111.951,683.915,302.268,385.751  
 3,111.951,683.915,302.268,385.751

 can have a list of number in 1 of 2 formats:
A comma separated list of 4 integers, i.e.
,,,
OR
A comma separated list of 1 integer, followed by 4 decimal numbers, i.e.


I need filter the lines of this list with a REGEX pattern to get lines
WHERE a value pPage matches certain places in , specifically:
where pPage is equal to either  or  in the first
format(i.e. item 1 or item 3)
OR
where pPage is equal to  in the second format(i.e. item 1)

So my code is:
put
"((.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+))"
into tMatchPattern
filter lines of tList with regex pattern tMatchPattern
 
If pPage is 1 then I should get:
1Testing1,7471,1,1,747
2Testing752,18001,752,1,1800
and I do. If pPage is 2 then I should get:
3Testing5398,58462,320,2,768
and I do. If pPage is 3 then I should get:
4Testing3,111.951,683.915,302.268,385.751  
 3,111.951,683.915,302.268,385.751
and I do. if pPage is 4 then I should get and empty list, and I do, but
when pPage is 5, I am expecting an empty list and I get
3Testing5398,58462,320,2,768

So something is wrong with my Regex, but I can not figure out what? It
looks like it is matching against  in the last case
(pPage=5) but it should not since there are only 2 items in the list
rather than 4 or 5.

I am using LiveCode 6.7.6


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


AW: Re: Regex help needed...

2016-01-30 Thread Paul Dupuis
Never mind. Solved it.

It was the pattern for the 2nd format. Fixed with
"(.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)"

On 1/30/2016 3:17 PM, Paul Dupuis wrote:
> I need some regex help.
>
> I have a list that is of the form:
> 
> i.e.
> 1Testing1,7471,1,1,747
> 2Testing752,18001,752,1,1800
> 3Testing5398,58462,320,2,768
> 4Testing3,111.951,683.915,302.268,385.751  
>  3,111.951,683.915,302.268,385.751
>
>  can have a list of number in 1 of 2 formats:
> A comma separated list of 4 integers, i.e.
> ,,,
> OR
> A comma separated list of 1 integer, followed by 4 decimal numbers, i.e.
> 
>
> I need filter the lines of this list with a REGEX pattern to get lines
> WHERE a value pPage matches certain places in , specifically:
> where pPage is equal to either  or  in the first
> format(i.e. item 1 or item 3)
> OR
> where pPage is equal to  in the second format(i.e. item 1)
>
> So my code is:
> put
> "((.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+))"
> into tMatchPattern
> filter lines of tList with regex pattern tMatchPattern
>  
> If pPage is 1 then I should get:
> 1Testing1,7471,1,1,747
> 2Testing752,18001,752,1,1800
> and I do. If pPage is 2 then I should get:
> 3Testing5398,58462,320,2,768
> and I do. If pPage is 3 then I should get:
> 4Testing3,111.951,683.915,302.268,385.751  
>  3,111.951,683.915,302.268,385.751
> and I do. if pPage is 4 then I should get and empty list, and I do, but
> when pPage is 5, I am expecting an empty list and I get
> 3Testing5398,58462,320,2,768
>
> So something is wrong with my Regex, but I can not figure out what? It
> looks like it is matching against  in the last case
> (pPage=5) but it should not since there are only 2 items in the list
> rather than 4 or 5.
>
> I am using LiveCode 6.7.6
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: AW: Re: Regex help needed...

2016-01-30 Thread Richard Gaskin
Regex is wonderfully compact to write relative to equivalent routines 
using chunk expressions, but sometimes paid for in execution time.


When I come across a good regex example like the one you provided, if I 
have a moment I like to test things out to see where regex is faster and 
where it isn't.  It's really great for many things, but carries quite a 
bit of overhead.


Of course for this test to be relevant it assumes that most of the 
specifiers in the regex expression are merely to identify the elements 
you're looking for, and that the data is expected to fit the definition 
you provided.


Given that, it's possible to make the regex a bit simpler (see foo2 
below), but only with a modest boost to performance.  It can probably be 
simplified more, but the chunk-based alternative performed so well I 
didn't bother exploring the regex side any further.


Writing a lengthier handler that uses chunk expressions seems to yield 
the same results you reported, running between 12 and 60 times faster 
(depending on the percentage of lines tested that match the criteria 
being looked for).


For one-offs like validating email addresses regex can be an excellent 
fit, and even some larger tasks depending on the specifics.


But for iterating across lists I've often been delightfully surprised by 
LiveCode's gracefully efficient chunk handling.


Testing your original data replicated to become 250 lines long, and 
looking for page 1 among them, the script below yields:


Regex: 9261 ms
RegexLite: 7958 ms
Chunks: 197 ms
Chunks faster than orig regex by: 47.01 times
Chunks faster than lite regex by: 40.4 times
Same result? true


on mouseUp
  put fld 1 into tList
  put 1 into tPage --< change this for different tests
  put 1000 into n
  --
  -- Test 1: original regex
  put the millisecs into t
  repeat n
put foo1(tPage, tList) into r1
  end repeat
  put the millisecs - t into t1
  --
  -- Test 2: lighter regex
  put the millisecs into t
  repeat n
put foo2(tPage, tList) into r2
  end repeat
  put the millisecs - t into t2
  --
  -- Test 3: chunks
  put the millisecs into t
  repeat n
put foo3(tPage, tList) into r3
  end repeat
  put the millisecs - t into t3
  --
  -- Display results:
  set the numberformat to "0.##"
  put "Regex: " &" ms" \
&"RegexLite: " &" ms" \
&"Chunks: "& t3 &" ms" \
&"Chunks faster than orig regex by: "&(t1 / t3)&" times"  \
&"Chunks faster than lite regex by: "&(t2 / t3)&" times"  \
&"Same result? "& (r1=r3) & r1 & r3
end mouseUp


function foo1 pPage, tList
  put 
"(.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)" 
into tMatchPattern

  filter lines of tList with regex pattern tMatchPattern
  return tList
end foo1


function foo2 pPage, tList
  put "(.+\t"&",*)|(.+\t\d+,\d+,"&",*)|(.+\t"&",*)" 
into tMatchPattern

  filter lines of tList with regex pattern tMatchPattern
  return tList
end foo2



function foo3 pPage, tList
  repeat for each line tLine in tList
set the itemdel to tab
put item 3 of tLine into t1
put pPage &"," into tPageMarker
if "." is in t1 then
  if (t1 begins with tPageMarker) then
put tLine  after tNuList
  end if
else
  if ( t1 begins with tPageMarker) OR (item 4 of tLine begins with 
tPageMarker) then

put tLine  after tNuList
  end if
end if
  end repeat
  delete last char of tNuList
  return tNuList
end foo3










--
 Richard Gaskin
 Fourth World Systems
 Software Design and Development for the Desktop, Mobile, and the Web
 
 ambassa...@fourthworld.comhttp://www.FourthWorld.com


Paul Dupuis wrote:

Never mind. Solved it.

It was the pattern for the 2nd format. Fixed with
"(.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)"

On 1/30/2016 3:17 PM, Paul Dupuis wrote:

I need some regex help.

I have a list that is of the form:

i.e.
1Testing1,7471,1,1,747
2Testing752,18001,752,1,1800
3Testing5398,58462,320,2,768
4Testing3,111.951,683.915,302.268,385.751
 3,111.951,683.915,302.268,385.751

 can have a list of number in 1 of 2 formats:
A comma separated list of 4 integers, i.e.
,,,
OR
A comma separated list of 1 integer, followed by 4 decimal numbers, i.e.


I need filter the lines of this list with a REGEX pattern to get lines
WHERE a value pPage matches certain places in , specifically:
where pPage is equal to either  or  in the first
format(i.e. item 1 or item 3)
OR
where pPage is equal to  in the second format(i.e. item 1)

So my code is:
put
"((.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+,?[0-9]*\.?[0-9]+))"
into tMatchPattern
filter lines of tList with regex pattern tMatchPattern

If pPage is 1 then I should get:
1Testing1,7471,1,1,747
2Testing752,1800

Re: AW: Re: Regex help needed...

2016-01-30 Thread Paul Dupuis
Wow. I would not have expected such a significant difference. Regex has
been around a long time and lots of smart computer science types has
spent time coming up with ways to optimize its performance for pattern
matching. I assumed (falsely) that regex based filters in LC would be on
par or even superior than a custom function using chunks. This leads me to:

1) wondering if LC's hooks to whatever regex tool they are using under
the hood is a good as it should be
AND
2) planning on rewriting my code to use chunks.

Thanks for the post.


On 1/30/2016 6:45 PM, Richard Gaskin wrote:
> Regex is wonderfully compact to write relative to equivalent routines
> using chunk expressions, but sometimes paid for in execution time.
>
> When I come across a good regex example like the one you provided, if
> I have a moment I like to test things out to see where regex is faster
> and where it isn't.  It's really great for many things, but carries
> quite a bit of overhead.
>
> Of course for this test to be relevant it assumes that most of the
> specifiers in the regex expression are merely to identify the elements
> you're looking for, and that the data is expected to fit the
> definition you provided.
>
> Given that, it's possible to make the regex a bit simpler (see foo2
> below), but only with a modest boost to performance.  It can probably
> be simplified more, but the chunk-based alternative performed so well
> I didn't bother exploring the regex side any further.
>
> Writing a lengthier handler that uses chunk expressions seems to yield
> the same results you reported, running between 12 and 60 times faster
> (depending on the percentage of lines tested that match the criteria
> being looked for).
>
> For one-offs like validating email addresses regex can be an excellent
> fit, and even some larger tasks depending on the specifics.
>
> But for iterating across lists I've often been delightfully surprised
> by LiveCode's gracefully efficient chunk handling.
>
> Testing your original data replicated to become 250 lines long, and
> looking for page 1 among them, the script below yields:
>
> Regex: 9261 ms
> RegexLite: 7958 ms
> Chunks: 197 ms
> Chunks faster than orig regex by: 47.01 times
> Chunks faster than lite regex by: 40.4 times
> Same result? true
>
>
> on mouseUp
>   put fld 1 into tList
>   put 1 into tPage --< change this for different tests
>   put 1000 into n
>   --
>   -- Test 1: original regex
>   put the millisecs into t
>   repeat n
> put foo1(tPage, tList) into r1
>   end repeat
>   put the millisecs - t into t1
>   --
>   -- Test 2: lighter regex
>   put the millisecs into t
>   repeat n
> put foo2(tPage, tList) into r2
>   end repeat
>   put the millisecs - t into t2
>   --
>   -- Test 3: chunks
>   put the millisecs into t
>   repeat n
> put foo3(tPage, tList) into r3
>   end repeat
>   put the millisecs - t into t3
>   --
>   -- Display results:
>   set the numberformat to "0.##"
>   put "Regex: " &" ms" \
> &"RegexLite: " &" ms" \
> &"Chunks: "& t3 &" ms" \
> &"Chunks faster than orig regex by: "&(t1 / t3)&" times"  \
> &"Chunks faster than lite regex by: "&(t2 / t3)&" times"  \
> &"Same result? "& (r1=r3) & r1 & r3
> end mouseUp
>
>
> function foo1 pPage, tList
>   put
> "(.+\t"&",\d+,\d+,\d+)|(.+\t\d+,\d+,"&",\d+)|(.+\t"&",\d*\.?\d*,\d*\.?\d*,\d*\.?\d*,\d*\.?\d*)"
> into tMatchPattern
>   filter lines of tList with regex pattern tMatchPattern
>   return tList
> end foo1
>
>
> function foo2 pPage, tList
>   put "(.+\t"&",*)|(.+\t\d+,\d+,"&",*)|(.+\t"&",*)"
> into tMatchPattern
>   filter lines of tList with regex pattern tMatchPattern
>   return tList
> end foo2
>
>
>
> function foo3 pPage, tList
>   repeat for each line tLine in tList
> set the itemdel to tab
> put item 3 of tLine into t1
> put pPage &"," into tPageMarker
> if "." is in t1 then
>   if (t1 begins with tPageMarker) then
> put tLine  after tNuList
>   end if
> else
>   if ( t1 begins with tPageMarker) OR (item 4 of tLine begins with
> tPageMarker) then
> put tLine  after tNuList
>   end if
> end if
>   end repeat
>   delete last char of tNuList
>   return tNuList
> end foo3
>
>
>
>
>
>
>
>
>
>


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: AW: Re: Regex help needed...

2016-01-30 Thread Mark Wieder

On 01/30/2016 04:28 PM, Paul Dupuis wrote:


1) wondering if LC's hooks to whatever regex tool they are using under
the hood is a good as it should be


LC's regex library is the same PCRE library everyone else uses. And it's 
the latest released version. Regex's power lies in its ability to match 
complex patterns, which doesn't necessarily translate to speed.



AND
2) planning on rewriting my code to use chunks.


You may find that regex matching works better than LC's chunk matching 
in some situations. For speed though, it's hard to beat the built=in 
chunking functions in LC, as they're already pretty well optimized.


--
 Mark Wieder
 ahsoftw...@gmail.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: AW: Re: Regex help needed...

2016-01-30 Thread Richard Gaskin

Paul Dupuis wrote:
> Wow. I would not have expected such a significant difference. Regex
> has been around a long time and lots of smart computer science types
> has spent time coming up with ways to optimize its performance for
> pattern matching. I assumed (falsely) that regex based filters in LC
> would be on par or even superior than a custom function using chunks.
> This leads me to:
>
> 1) wondering if LC's hooks to whatever regex tool they are using under
> the hood is a good as it should be
> AND
> 2) planning on rewriting my code to use chunks.

One of the reasons for my seemingly-obsessive benchmarking is to learn 
about what goes on under the hood, and to try to anticipate it when 
choosing among different algos.


LC does such a good job of shielding us from what goes on under the hood 
that we often forget that the relationship between the number of lines 
we write and the number of machine instructions our scripts invoke may 
differ broadly depending on the statement.


My favorite example is: set the scroll of field 1 to 100 -- seems simple 
enough, but having written scrollbar management routines in C back in 
the pre-Cocoa days I learned that a *tremendous* number of low-level 
routines come into play with that one simple line of script.  LC makes 
it easy to take this stuff for granted, since it does all the heavy lifting.


Same with regex.

The beauty of regex is that it's a very generalized solution.  The 
downside of regex is that it's a very generalized solution. ;)


Generalized options can provide convenience, but often at the cost of 
performance.


Purpose-built solutions are usually much faster than generalized ones, 
and with LC's chunk expressions they're fun to write too. :)


There are times when regex will outperform chunk expressions, though, so 
I would caution against rewriting everything.  Benchmarking is the key, 
and some day I will have done enough to be able to come up with a small 
set of useful rules as to when to use chunks and when to use regex.  But 
at the moment, it's half hunch and half benchmarking to confirm the hunch.


--
 Richard Gaskin
 Fourth World Systems
 Software Design and Development for the Desktop, Mobile, and the Web
 
 ambassa...@fourthworld.comhttp://www.FourthWorld.com


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode