Re: Optimization can be tricky

2018-06-19 Thread Tom Glod via use-livecode
Nicethe proof is in the pudding. Thats why the say a good programmer
spends most of their time staring at the ceiling ... thinking.  Who
they are ...I do not know.

On Tue, Jun 19, 2018 at 12:47 AM, Curry Kenworthy via use-livecode <
use-livecode@lists.runrev.com> wrote:

>
> Geoff wrote:
>
> > This is certainly not identical to the original method, but it's
> > close enough, and runs in a small fraction of a second.
>
> Good approach! Optimized code rules.
>
> Best wishes,
>
> Curry K.
>
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-18 Thread Curry Kenworthy via use-livecode



Geoff wrote:

> This is certainly not identical to the original method, but it's
> close enough, and runs in a small fraction of a second.

Good approach! Optimized code rules.

Best wishes,

Curry K.


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-18 Thread Geoff Canyon via use-livecode
Thanks for all the suggestions -- I tried multiple ideas without much
improvement, then decided to rethink the problem. Roughly:

I have a bunch of source data that I categorized by two numbers, the first
from 1-N where N might be anywhere from 100-5,000, and the second from
1-100. For any given first number, there might be about fifty entries, with
varying values for the second number.

Then for a given query I was trying to:

1. Take a list of 20-100 values for the first number, and for each of
those, a value for the second number,
2. Return a set of about 10-20 entries, where each entry matches one of the
first numbers from the list, and then probabilistically selected based on
the proximity of the second number from the list to the second number of
the item.

(The userSeen aspect was an additional wrinkle)

Since there are about 50 entries for each of the first numbers, the entries
that *might* be returned number [20-100] * 50, or about 1,000-5,000
entries. And everything I tried was too slow, since ideally I want to be
able to do this anywhere from 10-100 times per second.

So I reframed at the source. Instead of keeping all the candidates in a
single array keyed by the first number, I kept them in sub-arrays keyed by
the first digit of the second number.

Then, instead of gathering all the possible values based on the first
number matches, I rotate the list of first number matches each time I
retrieve, and then parse through the resulting matches in order, gathering
any that work, until I have enough to return.

This is certainly not identical to the original method, but it's close
enough, and runs in a small fraction of a second.

thanks again,

gc

On Tue, Jun 12, 2018 at 6:04 PM, Tom Glod via use-livecode <
use-livecode@lists.runrev.com> wrote:

> Thanks for the tip Ralphlove the sound of that filer function.
>
> On Tue, Jun 12, 2018 at 7:00 PM, Curry Kenworthy via use-livecode <
> use-livecode@lists.runrev.com> wrote:
>
> >
> > Optimizing scripts in LC is not the same as running reports in other
> > software suites. If you only need 10 results, you probably don't want to
> > handle all items twice.
> >
> > I hate to loop through all items even once. But if I do, I may be done!
> > I'm not making a full report to print out; I'm just getting my 10 items.
> If
> > I use sort or filter, likewise, I will do whatever necessary to keep it
> > short and sweet, even if that means adjusting some of the starting
> > assumptions.
> >
> > If the sort really is taking more time than the loop, something is
> bogging
> > it down. Look at the random() and arithmetic attached to the sort. That's
> > potentially like running a whole lot of LCS. So if you sort, try a plain
> > numeric sort with no fancy stuff added. Then go grab your 10 - the other
> > requirements can be addressed before or after the sort.
> >
> > Best wishes,
> >
> > Curry Kenworthy
> >
> > Custom Software Development
> > LiveCode Training and Consulting
> > http://livecodeconsulting.com/
> >
> > ___
> > use-livecode mailing list
> > use-livecode@lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> > subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
> >
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-12 Thread Tom Glod via use-livecode
Thanks for the tip Ralphlove the sound of that filer function.

On Tue, Jun 12, 2018 at 7:00 PM, Curry Kenworthy via use-livecode <
use-livecode@lists.runrev.com> wrote:

>
> Optimizing scripts in LC is not the same as running reports in other
> software suites. If you only need 10 results, you probably don't want to
> handle all items twice.
>
> I hate to loop through all items even once. But if I do, I may be done!
> I'm not making a full report to print out; I'm just getting my 10 items. If
> I use sort or filter, likewise, I will do whatever necessary to keep it
> short and sweet, even if that means adjusting some of the starting
> assumptions.
>
> If the sort really is taking more time than the loop, something is bogging
> it down. Look at the random() and arithmetic attached to the sort. That's
> potentially like running a whole lot of LCS. So if you sort, try a plain
> numeric sort with no fancy stuff added. Then go grab your 10 - the other
> requirements can be addressed before or after the sort.
>
> Best wishes,
>
> Curry Kenworthy
>
> Custom Software Development
> LiveCode Training and Consulting
> http://livecodeconsulting.com/
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-12 Thread Curry Kenworthy via use-livecode



Optimizing scripts in LC is not the same as running reports in other 
software suites. If you only need 10 results, you probably don't want to 
handle all items twice.


I hate to loop through all items even once. But if I do, I may be done! 
I'm not making a full report to print out; I'm just getting my 10 items. 
If I use sort or filter, likewise, I will do whatever necessary to keep 
it short and sweet, even if that means adjusting some of the starting 
assumptions.


If the sort really is taking more time than the loop, something is 
bogging it down. Look at the random() and arithmetic attached to the 
sort. That's potentially like running a whole lot of LCS. So if you 
sort, try a plain numeric sort with no fancy stuff added. Then go grab 
your 10 - the other requirements can be addressed before or after the sort.


Best wishes,

Curry Kenworthy

Custom Software Development
LiveCode Training and Consulting
http://livecodeconsulting.com/

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


RE: Optimization can be tricky

2018-06-12 Thread Ralph DiMola via use-livecode
Filter is lightning fast. I have the list of all 40,000+ cities in the US
and have a predictive search field. I tried DBs, in memory DB and other
methods until I stumbled on to the filter operator. Load a text file into a
var and the start filtering. There is no perceptible delay when typing the
first few letters of a city name. I even carry some meta data along for the
ride.
I don't know if it can  be used here. I have not had the time to look at the
code or the problem presented. 

Ralph DiMola
IT Director
Evergreen Information Services
rdim...@evergreeninfo.net


-Original Message-
From: use-livecode [mailto:use-livecode-boun...@lists.runrev.com] On Behalf
Of hh via use-livecode
Sent: Tuesday, June 12, 2018 4:39 PM
To: use-livecode@lists.runrev.com
Cc: hh
Subject: Re: Optimization can be tricky

The scenario Geoff described is roughly to get the top ten (a handful) of
2000 records comparing a certain numeric value of each.
To get that unknown value of each one has to go once through all the records
*and then sort* for that ranking.
(LiveCode is very fast with a simple numeric sort!) Any other method of
ranking while going through the data will generously waste CPU time.

> Curry wrote:
> Put yourself in the computer's shoes, and also clarify what you need 
> to accomplish. You are asking it to sort the entire list of 2000 
> records, but (if I understand) you only want a handful of those.
> 
> And it has already gone through all the records once before the sort. 
> If you asked a human to do that, it would be a risky interaction. 
> There might be raised voices when they present the neatly arranged 
> file cabinet with files in a special order, and you deliver the punch 
> line that you only need to pull a few. If you only need to pull 5 
> records, most people don't want to go through all 2000 files - twice.
> 
> So generally, don't use a sort after doing a complete loop on a big 
> text list, use another method. Do unto your computer as you hope it 
> will do unto you when they run the world. :)


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-12 Thread Alex Tweedly via use-livecode
Then there's something about Geoff's problem (or problem statement) that 
I don't understand.


What on earth is in those records for sorting a mere 2000 of them to be 
noticable? (I had thought he had to be talking about magnitudes more 
lines than that).


LC (9.0) sorts 20,000 lines of >1000 chars each in 65ms or so, on my 
aging MacBook Pro. So the delay can't be the sort - it must be in the 
extracting of the data from the arrays into his 'candidateList'.


It would be good to get that clarified before we spend more time 
optimizing the wrong thing :-)


Alex.



On 12/06/2018 21:39, hh via use-livecode wrote:

The scenario Geoff described is roughly to get the
top ten (a handful) of 2000 records comparing a certain
numeric value of each.
To get that unknown value of each one has to go once through
all the records *and then sort* for that ranking.
(LiveCode is very fast with a simple numeric sort!)
Any other method of ranking while going through the data
will generously waste CPU time.


Curry wrote:
Put yourself in the computer's shoes, and also clarify what you need to
accomplish. You are asking it to sort the entire list of 2000 records,
but (if I understand) you only want a handful of those.

And it has already gone through all the records once before the sort. If
you asked a human to do that, it would be a risky interaction. There
might be raised voices when they present the neatly arranged file
cabinet with files in a special order, and you deliver the punch line
that you only need to pull a few. If you only need to pull 5 records,
most people don't want to go through all 2000 files - twice.

So generally, don't use a sort after doing a complete loop on a big text
list, use another method. Do unto your computer as you hope it will do
unto you when they run the world. :)


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-12 Thread hh via use-livecode
The scenario Geoff described is roughly to get the
top ten (a handful) of 2000 records comparing a certain
numeric value of each.
To get that unknown value of each one has to go once through
all the records *and then sort* for that ranking.
(LiveCode is very fast with a simple numeric sort!)
Any other method of ranking while going through the data
will generously waste CPU time.

> Curry wrote:
> Put yourself in the computer's shoes, and also clarify what you need to 
> accomplish. You are asking it to sort the entire list of 2000 records, 
> but (if I understand) you only want a handful of those.
> 
> And it has already gone through all the records once before the sort. If 
> you asked a human to do that, it would be a risky interaction. There 
> might be raised voices when they present the neatly arranged file 
> cabinet with files in a special order, and you deliver the punch line 
> that you only need to pull a few. If you only need to pull 5 records, 
> most people don't want to go through all 2000 files - twice.
> 
> So generally, don't use a sort after doing a complete loop on a big text 
> list, use another method. Do unto your computer as you hope it will do 
> unto you when they run the world. :)


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-12 Thread Curry Kenworthy via use-livecode



Put yourself in the computer's shoes, and also clarify what you need to 
accomplish. You are asking it to sort the entire list of 2000 records, 
but (if I understand) you only want a handful of those.


And it has already gone through all the records once before the sort. If 
you asked a human to do that, it would be a risky interaction. There 
might be raised voices when they present the neatly arranged file 
cabinet with files in a special order, and you deliver the punch line 
that you only need to pull a few. If you only need to pull 5 records, 
most people don't want to go through all 2000 files - twice.


So generally, don't use a sort after doing a complete loop on a big text 
list, use another method. Do unto your computer as you hope it will do 
unto you when they run the world. :)


(But I would be interested in the speed of your original code on LC 9 vs 
LC 6.7, just to see how engine optimization is faring. I'm surprised it 
took a minute.)


Best wishes,

Curry Kenworthy

Custom Software Development
LiveCode Training and Consulting
http://livecodeconsulting.com/

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-12 Thread Alex Tweedly via use-livecode
I don't know if these are good ideas or BAD ideas  and without some 
suitable data, not sure if I could find out - so I'll throw the ideas 
over and let you decide if they are worth trying :-)



1. Two ideas combined

 - avoid scanning for the "item 1" of each line
 - use the (hopefully optimised enough) numeric-index array lookups

So, instead of

  put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
  after candidateList

try

  put T,S  into  candidateArray[count]
  put random(101 + s1 * 30 + 5 * s2 - i2) into candidateValue[count]
  add 1 to count

And then later
   put the keys of candidateValue into candidatelist
   sort numeric the lines of candidatelist by candidateValue of each
and the access via the candidateArray.

2. (even wilder)
Assume you have N entries to consider, and that you only need M results
The *IF* M << N (i.e. much less than - you only need a few results from a large 
input set).

You might try something like this for the inner loop :
(going back to using lines ...)

  repeat for each line S in storyArray[T]
put userSeenArray[uID][item 1 of S] into s1
put abs(item 2 of S - i1) into s2
if s2 < 20 and s1 < 4 then
  put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
  after candidateList
  add 1 to lineCount
  if lineCount > 4 * M then-- (why 4 ? - maybe 2, maybe 8 ??)
  sort candidateList numeric by item 1 of each
  delete line M+1 to -1 of candidateList
  put M into lineCount
  end if
    end if
  end repeat


3. even wilder still 
 - create K different candidateLists (where K is some power of 2)
 - sort each one,
 - mergesort in pairs, stopping when you get to M ...

put 8 into KK-- (why 8  ? - maybe 16, or 32 ...)
repeat for each key T in interestArray[uID]
  put item 1 of interestArray[uID][T] into i1
  put item 2 of interestArray[uID][T] into i2
  repeat for each line S in storyArray[T]
put userSeenArray[uID][item 1 of S] into s1
put abs(item 2 of S - i1) into s2
if s2 < 20 and s1 < 4 then
  add 1 to setCount
  put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
  after candidateList[setCount]
  if setCount > KK then
  put 0 into setCount
  end if
    end if
  end repeat
repeat with i = 1 to KK
   sort numeric lines of candidateList[i] by item 1 of each
end repeat
and then write / use a mergesort that takes two sorted lists, and returns the 
first N of the merged list.

(or even better, returns the first N, with a limit value; then each mergesort 
after the first can use the max value from the N'th entry of earlier merges to 
allow for early exit...)

That is left as an exercise for the reader :-) :-)

As I say - could be totally crazy, could be worthwhile.

I'd be happy to try out benchmarking it - if I had some representative data to 
play with.

Alex.

On 12/06/2018 12:03, hh via use-livecode wrote:


You could try the following:

repeat for each key T in interestArray[uID]
   put item 1 of interestArray[uID][T] into i1
   put item 2 of interestArray[uID][T] into i2
   repeat for each line S in storyArray[T]
 put userSeenArray[uID][item 1 of S] into s1
 put abs(item 2 of S - i1) into s2
 if s2 < 20 and s1 < 4 then
   put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
   after candidateList
 end if
   end repeat
end repeat
sort candidateList numeric by item 1 of each



Ali wrote:
repeat for each key T in interestArray[uID]


Geoff wrote:
repeat for each line T in the keys of interestArray[uID]
   repeat for each line S in storyArray[T]
 if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
   and userSeenArray[uID][item 1 of S] < 4
 then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
   abs(item 2 of S - item 1 of interestArray[uID][T]) - \
   item 2 of interestArray[uID][T]),T,S & cr after candidateList
   end repeat
end repeat
sort lines of candidateList numeric by random(item 1 of each)

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Optimization can be tricky

2018-06-12 Thread hh via use-livecode
Sorry, I forgot to mention Jerry who already proposed to pull out computations
from the inner loop and also to use variables in the inner loop. My experience
says to avoid getting items as often as possible. And to use the random() in the
inner repeat loop instead of in the final sort may be worth a trial with a large
data set.

I wrote:

> You could try the following.
> 
> repeat for each key T in interestArray[uID]
>   put item 1 of interestArray[uID][T] into i1
>   put item 2 of interestArray[uID][T] into i2
>   repeat for each line S in storyArray[T]
> put userSeenArray[uID][item 1 of S] into s1
> put abs(item 2 of S - i1) into s2
> if s2 < 20 and s1 < 4 then
>   put random(101 + s1 * 30 + 5 *  s2 - i2),T,S & cr after candidateList
> end if
>   end repeat
> end repeat
> sort candidateList numeric by item 1 of each
> 
> 
> Ali wrote:
> repeat for each key T in interestArray[uID]
> 
> Geoff wrote:
> repeat for each line T in the keys of interestArray[uID]
>   repeat for each line S in storyArray[T]
>if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>  and userSeenArray[uID][item 1 of S] < 4
>then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>  abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>  item 2 of interestArray[uID][T]),T,S & cr after candidateList
>   end repeat
> end repeat
> sort lines of candidateList numeric by random(item 1 of each)

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-12 Thread hh via use-livecode
You could try the following:

repeat for each key T in interestArray[uID]
  put item 1 of interestArray[uID][T] into i1
  put item 2 of interestArray[uID][T] into i2
  repeat for each line S in storyArray[T]
put userSeenArray[uID][item 1 of S] into s1
put abs(item 2 of S - i1) into s2
if s2 < 20 and s1 < 4 then
  put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
  after candidateList
end if
  end repeat
end repeat
sort candidateList numeric by item 1 of each


> Ali wrote:
> repeat for each key T in interestArray[uID]
> 
>> Geoff wrote:
>> repeat for each line T in the keys of interestArray[uID]
>>   repeat for each line S in storyArray[T]
>>   if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>> and userSeenArray[uID][item 1 of S] < 4
>>   then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>> abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>> item 2 of interestArray[uID][T]),T,S & cr after candidateList
>>   end repeat
>> end repeat
>> sort lines of candidateList numeric by random(item 1 of each)

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Optimization can be tricky

2018-06-12 Thread Ali Lloyd via use-livecode
Hi Geoff,

One thing to try in your original code, which should be significantly
faster if the array is big, is using

> repeat for each key T in interestArray[uID]

instead of

> repeat for each line T in the keys of interestArray[uID]

The latter has to allocate memory for a string containing all the keys, AND
iterate through the lines of that string, whereas repeating for each
key directly
accesses the keys of the array in hash order, and thus no extra allocation
will occur and the iteration is faster than finding the next line delimiter
each time.



On Tue, Jun 12, 2018 at 8:35 AM Clarence Martin via use-livecode <
use-livecode@lists.runrev.com> wrote:

> I know that this might be a different use-case but I have a CA Lottery -
> Fantasy five lottery parser that collects lottery data and loads it into a
> matrix and provides the total number of  hits for each number. This also
> has 4 optimization buttons to the show differences. This aupplication was
> optimized by our Pasadena LiveCode user's group. Drop me an email and I
> will send you a zipped copy of the Script with a sample lotto data file
> with approximately 8000 lines of picks.
> Email: chi...@themartinz.com
>
> -Original Message-
> From: use-livecode  On Behalf Of
> Geoff Canyon via use-livecode
> Sent: Monday, June 11, 2018 5:22 PM
> To: How to use LiveCode 
> Cc: Geoff Canyon 
> Subject: Optimization can be tricky
>
> ​I have a routine that takes about a minute to run for the test case I
> created, and who knows how long for the real use case. Given that I want to
> run it several thousand times for actual work, I need it to run (much)
> faster.
>
> Roughly, the routine gathers together a bunch of data from arrays, sorts
> the data based on its relationship to other arrays, and then returns a
> subset of the result.
>
> My first pass at the routine looked roughly like this:
>
>repeat for each line T in the keys of interestArray[uID]
>   repeat for each line S in storyArray[T]
>  if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>and userSeenArray[uID][item 1 of S] < 4
>  then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>item 2 of interestArray[uID][T]),T,S & cr after
> candidateList
>   end repeat
>end repeat
>sort lines of candidateList numeric by random(item 1 of each)
>
> In simple terms: parse through the individual lines of all the entries
> that possibly work, calculating a relevant value for each and appending
> that value and the line to an interim list, which then gets sorted,
> randomly favoring lower values.
>
> I assumed the problem was all the line-by-line parsing, and I thought of a
> clever way to accomplish the same thing. That led to this:
>
>put storyArray into R
>intersect R with interestArray[uID]
>combine R using cr and comma
>sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 *
> \
>  abs(item 3 of each - item 1 of interestArray[uID][item 1 of
> each]) \
>  - item 2 of interestArray[uID][item 1 of each])
>
> Much simpler, albeit that's a heck of a "sort by" -- more complex by far
> than any I had previously created, and a testament to the power and
> flexibility of "each". Alas, despite condensing my code and removing
> parsing and loops, that version took ten seconds more than the previous
> version, I'm guessing because the previous version has that "if" statement
> that weeds out undesirable entries before the sort has to deal with them.
>
> (I'm writing this email as I parse through this figuring it out)
>
> So it turns out that the crazy use of "each" is only part of the problem
> -- changing that line to:
>
>sort lines of R by random(1)
>
> still takes over 20 seconds -- 3x as fast, but still far too slow. It
> turns out that the source data numbers anywhere from 1,000 to 2,000 lines,
> so sorting it in any way to randomly select the several lines I need is a
> really bad choice. removing the sort step but keeping everything else cuts
> the execution time down to under a second.
>
> Hmm. How to select several lines at weighted random from among a couple
> thousand? I'll think on this and post a follow-up.
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 

RE: Optimization can be tricky

2018-06-12 Thread Clarence Martin via use-livecode
I know that this might be a different use-case but I have a CA Lottery - 
Fantasy five lottery parser that collects lottery data and loads it into a 
matrix and provides the total number of  hits for each number. This also has 4 
optimization buttons to the show differences. This aupplication was optimized 
by our Pasadena LiveCode user's group. Drop me an email and I will send you a 
zipped copy of the Script with a sample lotto data file with approximately 8000 
lines of picks.
Email: chi...@themartinz.com

-Original Message-
From: use-livecode  On Behalf Of Geoff 
Canyon via use-livecode
Sent: Monday, June 11, 2018 5:22 PM
To: How to use LiveCode 
Cc: Geoff Canyon 
Subject: Optimization can be tricky

​I have a routine that takes about a minute to run for the test case I created, 
and who knows how long for the real use case. Given that I want to run it 
several thousand times for actual work, I need it to run (much) faster.

Roughly, the routine gathers together a bunch of data from arrays, sorts the 
data based on its relationship to other arrays, and then returns a subset of 
the result.

My first pass at the routine looked roughly like this:

   repeat for each line T in the keys of interestArray[uID]
  repeat for each line S in storyArray[T]
 if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
   and userSeenArray[uID][item 1 of S] < 4
 then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
   abs(item 2 of S - item 1 of interestArray[uID][T]) - \
   item 2 of interestArray[uID][T]),T,S & cr after candidateList
  end repeat
   end repeat
   sort lines of candidateList numeric by random(item 1 of each)

In simple terms: parse through the individual lines of all the entries that 
possibly work, calculating a relevant value for each and appending that value 
and the line to an interim list, which then gets sorted, randomly favoring 
lower values.

I assumed the problem was all the line-by-line parsing, and I thought of a 
clever way to accomplish the same thing. That led to this:

   put storyArray into R
   intersect R with interestArray[uID]
   combine R using cr and comma
   sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 * \
 abs(item 3 of each - item 1 of interestArray[uID][item 1 of each]) \
 - item 2 of interestArray[uID][item 1 of each])

Much simpler, albeit that's a heck of a "sort by" -- more complex by far than 
any I had previously created, and a testament to the power and flexibility of 
"each". Alas, despite condensing my code and removing parsing and loops, that 
version took ten seconds more than the previous version, I'm guessing because 
the previous version has that "if" statement that weeds out undesirable entries 
before the sort has to deal with them.

(I'm writing this email as I parse through this figuring it out)

So it turns out that the crazy use of "each" is only part of the problem -- 
changing that line to:

   sort lines of R by random(1)

still takes over 20 seconds -- 3x as fast, but still far too slow. It turns out 
that the source data numbers anywhere from 1,000 to 2,000 lines, so sorting it 
in any way to randomly select the several lines I need is a really bad choice. 
removing the sort step but keeping everything else cuts the execution time down 
to under a second.

Hmm. How to select several lines at weighted random from among a couple 
thousand? I'll think on this and post a follow-up.
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Optimization can be tricky

2018-06-11 Thread Brian Milby via use-livecode
Do you need the entire list sorted randomly or are you just needing to select N 
random entries from the list (weighted)? How does the speed change if you do a 
simple numeric sort?

You could use a function on a random number to weight things - would need to 
work out the right math to get what you want. Something like random(n)^2/n^2 
possibly. Depending on how heavily you want to prefer one end, you could use 
log or ln.

Your initial code is calculating “item 1 of interestArray[uID][T]” once or 
twice for every line S. I’m guessing that if you evaluate that to a variable 
once per repeat it would provide a small gain.

Depending on how often the if turns out to be true, some additional variables 
could possibly reduce the number of duplicate calculations enough to offset the 
cost of the assignment (what Jerry mentioned).
On Jun 11, 2018, 8:30 PM -0500, Jerry Jensen via use-livecode , wrote:
> At first glance, it looks like you might save some time by grabbing 
> interestArray[uID][T] as soon as you have T, and then use what you grabbed in 
> the 3 later places instead of re-figuring interestArray[uID][T] each time.
>
> As in:
> repeat for each line T in the keys of interestArray[uID]
> put interestArray[uID][T] into AuT —- grab it here
> repeat for each line S in storyArray[T]
> if abs(item 2 of S - item 1 of AuT) < 20 \
> and userSeenArray[uID][item 1 of S] < 4
> then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
> abs(item 2 of S - item 1 of AuT) - \
> item 2 of AuT),T,S & cr after candidateList
> end repeat
> end repeat
> sort lines of candidateList numeric by random(item 1 of each)
>
> It looks like you could do a similar thing with userSeenArray[uID][item 1 of 
> S] to avoid repeating the same calculation.
> Also with item 1 of AuT, and item 2 of S, with lesser gains but while you’re 
> at it…
> That all will make it easier (or harder) to read, depending on if you can 
> find descriptive variable names for the intermediate values.
>
> It won’t help the sort speed, but saves a lot of array un-hashing.
> I haven’t figured out what the algoraithm is doing, or the idea of the 
> randomness in the sort.
>
> All untested, of course!
> Cheers,
> Jerry
>
> > On Jun 11, 2018, at 5:21 PM, Geoff Canyon via use-livecode 
> >  wrote:
> >
> > My first pass at the routine looked roughly like this:
> >
> > repeat for each line T in the keys of interestArray[uID]
> > repeat for each line S in storyArray[T]
> > if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
> > and userSeenArray[uID][item 1 of S] < 4
> > then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
> > abs(item 2 of S - item 1 of interestArray[uID][T]) - \
> > item 2 of interestArray[uID][T]),T,S & cr after candidateList
> > end repeat
> > end repeat
> > sort lines of candidateList numeric by random(item 1 of each)
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Optimization can be tricky

2018-06-11 Thread Jerry Jensen via use-livecode
At first glance, it looks like you might save some time by grabbing 
interestArray[uID][T] as soon as you have T, and then use what you grabbed in 
the 3 later places instead of re-figuring interestArray[uID][T] each time.

As in:
repeat for each line T in the keys of interestArray[uID]
 put interestArray[uID][T] into AuT —- grab it here
 repeat for each line S in storyArray[T]
if abs(item 2 of S - item 1 of AuT) < 20 \
  and userSeenArray[uID][item 1 of S] < 4
then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
  abs(item 2 of S - item 1 of AuT) - \
  item 2 of AuT),T,S & cr after candidateList
 end repeat
  end repeat
  sort lines of candidateList numeric by random(item 1 of each)

It looks like you could do a similar thing with userSeenArray[uID][item 1 of S] 
to avoid repeating the same calculation.
Also with item 1 of AuT, and item 2 of S, with lesser gains but while you’re at 
it…
That all will make it easier (or harder) to read, depending on if you can find 
descriptive variable names for the intermediate values.

It won’t help the sort speed, but saves a lot of array un-hashing.
I haven’t figured out what the algoraithm is doing, or the idea of the 
randomness in the sort.

All untested, of course!
Cheers,
Jerry

> On Jun 11, 2018, at 5:21 PM, Geoff Canyon via use-livecode 
>  wrote:
> 
> My first pass at the routine looked roughly like this:
> 
>   repeat for each line T in the keys of interestArray[uID]
>  repeat for each line S in storyArray[T]
> if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>   and userSeenArray[uID][item 1 of S] < 4
> then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>   abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>   item 2 of interestArray[uID][T]),T,S & cr after candidateList
>  end repeat
>   end repeat
>   sort lines of candidateList numeric by random(item 1 of each)


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Optimization can be tricky

2018-06-11 Thread Tom Glod via use-livecode
please do goeff this subject is very interesting to me.  i have some
problems where i need to optimize in a similar way.  which kind of repeat
look have u found works fastest? have u tried ? repeat for each key
this_key in array? is that slower?

i love saving milliseconds. :) makes a big diff at scale.

On Mon, Jun 11, 2018 at 8:21 PM, Geoff Canyon via use-livecode <
use-livecode@lists.runrev.com> wrote:

> ​I have a routine that takes about a minute to run for the test case I
> created, and who knows how long for the real use case. Given that I want to
> run it several thousand times for actual work, I need it to run (much)
> faster.
>
> Roughly, the routine gathers together a bunch of data from arrays, sorts
> the data based on its relationship to other arrays, and then returns a
> subset of the result.
>
> My first pass at the routine looked roughly like this:
>
>repeat for each line T in the keys of interestArray[uID]
>   repeat for each line S in storyArray[T]
>  if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>and userSeenArray[uID][item 1 of S] < 4
>  then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>item 2 of interestArray[uID][T]),T,S & cr after
> candidateList
>   end repeat
>end repeat
>sort lines of candidateList numeric by random(item 1 of each)
>
> In simple terms: parse through the individual lines of all the entries that
> possibly work, calculating a relevant value for each and appending that
> value and the line to an interim list, which then gets sorted, randomly
> favoring lower values.
>
> I assumed the problem was all the line-by-line parsing, and I thought of a
> clever way to accomplish the same thing. That led to this:
>
>put storyArray into R
>intersect R with interestArray[uID]
>combine R using cr and comma
>sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 *
> \
>  abs(item 3 of each - item 1 of interestArray[uID][item 1 of each])
> \
>  - item 2 of interestArray[uID][item 1 of each])
>
> Much simpler, albeit that's a heck of a "sort by" -- more complex by far
> than any I had previously created, and a testament to the power and
> flexibility of "each". Alas, despite condensing my code and removing
> parsing and loops, that version took ten seconds more than the previous
> version, I'm guessing because the previous version has that "if" statement
> that weeds out undesirable entries before the sort has to deal with them.
>
> (I'm writing this email as I parse through this figuring it out)
>
> So it turns out that the crazy use of "each" is only part of the problem --
> changing that line to:
>
>sort lines of R by random(1)
>
> still takes over 20 seconds -- 3x as fast, but still far too slow. It turns
> out that the source data numbers anywhere from 1,000 to 2,000 lines, so
> sorting it in any way to randomly select the several lines I need is a
> really bad choice. removing the sort step but keeping everything else cuts
> the execution time down to under a second.
>
> Hmm. How to select several lines at weighted random from among a couple
> thousand? I'll think on this and post a follow-up.
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode