Re: Optimization can be tricky
Nicethe proof is in the pudding. Thats why the say a good programmer spends most of their time staring at the ceiling ... thinking. Who they are ...I do not know. On Tue, Jun 19, 2018 at 12:47 AM, Curry Kenworthy via use-livecode < use-livecode@lists.runrev.com> wrote: > > Geoff wrote: > > > This is certainly not identical to the original method, but it's > > close enough, and runs in a small fraction of a second. > > Good approach! Optimized code rules. > > Best wishes, > > Curry K. > > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
Geoff wrote: > This is certainly not identical to the original method, but it's > close enough, and runs in a small fraction of a second. Good approach! Optimized code rules. Best wishes, Curry K. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
Thanks for all the suggestions -- I tried multiple ideas without much improvement, then decided to rethink the problem. Roughly: I have a bunch of source data that I categorized by two numbers, the first from 1-N where N might be anywhere from 100-5,000, and the second from 1-100. For any given first number, there might be about fifty entries, with varying values for the second number. Then for a given query I was trying to: 1. Take a list of 20-100 values for the first number, and for each of those, a value for the second number, 2. Return a set of about 10-20 entries, where each entry matches one of the first numbers from the list, and then probabilistically selected based on the proximity of the second number from the list to the second number of the item. (The userSeen aspect was an additional wrinkle) Since there are about 50 entries for each of the first numbers, the entries that *might* be returned number [20-100] * 50, or about 1,000-5,000 entries. And everything I tried was too slow, since ideally I want to be able to do this anywhere from 10-100 times per second. So I reframed at the source. Instead of keeping all the candidates in a single array keyed by the first number, I kept them in sub-arrays keyed by the first digit of the second number. Then, instead of gathering all the possible values based on the first number matches, I rotate the list of first number matches each time I retrieve, and then parse through the resulting matches in order, gathering any that work, until I have enough to return. This is certainly not identical to the original method, but it's close enough, and runs in a small fraction of a second. thanks again, gc On Tue, Jun 12, 2018 at 6:04 PM, Tom Glod via use-livecode < use-livecode@lists.runrev.com> wrote: > Thanks for the tip Ralphlove the sound of that filer function. > > On Tue, Jun 12, 2018 at 7:00 PM, Curry Kenworthy via use-livecode < > use-livecode@lists.runrev.com> wrote: > > > > > Optimizing scripts in LC is not the same as running reports in other > > software suites. If you only need 10 results, you probably don't want to > > handle all items twice. > > > > I hate to loop through all items even once. But if I do, I may be done! > > I'm not making a full report to print out; I'm just getting my 10 items. > If > > I use sort or filter, likewise, I will do whatever necessary to keep it > > short and sweet, even if that means adjusting some of the starting > > assumptions. > > > > If the sort really is taking more time than the loop, something is > bogging > > it down. Look at the random() and arithmetic attached to the sort. That's > > potentially like running a whole lot of LCS. So if you sort, try a plain > > numeric sort with no fancy stuff added. Then go grab your 10 - the other > > requirements can be addressed before or after the sort. > > > > Best wishes, > > > > Curry Kenworthy > > > > Custom Software Development > > LiveCode Training and Consulting > > http://livecodeconsulting.com/ > > > > ___ > > use-livecode mailing list > > use-livecode@lists.runrev.com > > Please visit this url to subscribe, unsubscribe and manage your > > subscription preferences: > > http://lists.runrev.com/mailman/listinfo/use-livecode > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
Thanks for the tip Ralphlove the sound of that filer function. On Tue, Jun 12, 2018 at 7:00 PM, Curry Kenworthy via use-livecode < use-livecode@lists.runrev.com> wrote: > > Optimizing scripts in LC is not the same as running reports in other > software suites. If you only need 10 results, you probably don't want to > handle all items twice. > > I hate to loop through all items even once. But if I do, I may be done! > I'm not making a full report to print out; I'm just getting my 10 items. If > I use sort or filter, likewise, I will do whatever necessary to keep it > short and sweet, even if that means adjusting some of the starting > assumptions. > > If the sort really is taking more time than the loop, something is bogging > it down. Look at the random() and arithmetic attached to the sort. That's > potentially like running a whole lot of LCS. So if you sort, try a plain > numeric sort with no fancy stuff added. Then go grab your 10 - the other > requirements can be addressed before or after the sort. > > Best wishes, > > Curry Kenworthy > > Custom Software Development > LiveCode Training and Consulting > http://livecodeconsulting.com/ > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
Optimizing scripts in LC is not the same as running reports in other software suites. If you only need 10 results, you probably don't want to handle all items twice. I hate to loop through all items even once. But if I do, I may be done! I'm not making a full report to print out; I'm just getting my 10 items. If I use sort or filter, likewise, I will do whatever necessary to keep it short and sweet, even if that means adjusting some of the starting assumptions. If the sort really is taking more time than the loop, something is bogging it down. Look at the random() and arithmetic attached to the sort. That's potentially like running a whole lot of LCS. So if you sort, try a plain numeric sort with no fancy stuff added. Then go grab your 10 - the other requirements can be addressed before or after the sort. Best wishes, Curry Kenworthy Custom Software Development LiveCode Training and Consulting http://livecodeconsulting.com/ ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
RE: Optimization can be tricky
Filter is lightning fast. I have the list of all 40,000+ cities in the US and have a predictive search field. I tried DBs, in memory DB and other methods until I stumbled on to the filter operator. Load a text file into a var and the start filtering. There is no perceptible delay when typing the first few letters of a city name. I even carry some meta data along for the ride. I don't know if it can be used here. I have not had the time to look at the code or the problem presented. Ralph DiMola IT Director Evergreen Information Services rdim...@evergreeninfo.net -Original Message- From: use-livecode [mailto:use-livecode-boun...@lists.runrev.com] On Behalf Of hh via use-livecode Sent: Tuesday, June 12, 2018 4:39 PM To: use-livecode@lists.runrev.com Cc: hh Subject: Re: Optimization can be tricky The scenario Geoff described is roughly to get the top ten (a handful) of 2000 records comparing a certain numeric value of each. To get that unknown value of each one has to go once through all the records *and then sort* for that ranking. (LiveCode is very fast with a simple numeric sort!) Any other method of ranking while going through the data will generously waste CPU time. > Curry wrote: > Put yourself in the computer's shoes, and also clarify what you need > to accomplish. You are asking it to sort the entire list of 2000 > records, but (if I understand) you only want a handful of those. > > And it has already gone through all the records once before the sort. > If you asked a human to do that, it would be a risky interaction. > There might be raised voices when they present the neatly arranged > file cabinet with files in a special order, and you deliver the punch > line that you only need to pull a few. If you only need to pull 5 > records, most people don't want to go through all 2000 files - twice. > > So generally, don't use a sort after doing a complete loop on a big > text list, use another method. Do unto your computer as you hope it > will do unto you when they run the world. :) ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
Then there's something about Geoff's problem (or problem statement) that I don't understand. What on earth is in those records for sorting a mere 2000 of them to be noticable? (I had thought he had to be talking about magnitudes more lines than that). LC (9.0) sorts 20,000 lines of >1000 chars each in 65ms or so, on my aging MacBook Pro. So the delay can't be the sort - it must be in the extracting of the data from the arrays into his 'candidateList'. It would be good to get that clarified before we spend more time optimizing the wrong thing :-) Alex. On 12/06/2018 21:39, hh via use-livecode wrote: The scenario Geoff described is roughly to get the top ten (a handful) of 2000 records comparing a certain numeric value of each. To get that unknown value of each one has to go once through all the records *and then sort* for that ranking. (LiveCode is very fast with a simple numeric sort!) Any other method of ranking while going through the data will generously waste CPU time. Curry wrote: Put yourself in the computer's shoes, and also clarify what you need to accomplish. You are asking it to sort the entire list of 2000 records, but (if I understand) you only want a handful of those. And it has already gone through all the records once before the sort. If you asked a human to do that, it would be a risky interaction. There might be raised voices when they present the neatly arranged file cabinet with files in a special order, and you deliver the punch line that you only need to pull a few. If you only need to pull 5 records, most people don't want to go through all 2000 files - twice. So generally, don't use a sort after doing a complete loop on a big text list, use another method. Do unto your computer as you hope it will do unto you when they run the world. :) ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
The scenario Geoff described is roughly to get the top ten (a handful) of 2000 records comparing a certain numeric value of each. To get that unknown value of each one has to go once through all the records *and then sort* for that ranking. (LiveCode is very fast with a simple numeric sort!) Any other method of ranking while going through the data will generously waste CPU time. > Curry wrote: > Put yourself in the computer's shoes, and also clarify what you need to > accomplish. You are asking it to sort the entire list of 2000 records, > but (if I understand) you only want a handful of those. > > And it has already gone through all the records once before the sort. If > you asked a human to do that, it would be a risky interaction. There > might be raised voices when they present the neatly arranged file > cabinet with files in a special order, and you deliver the punch line > that you only need to pull a few. If you only need to pull 5 records, > most people don't want to go through all 2000 files - twice. > > So generally, don't use a sort after doing a complete loop on a big text > list, use another method. Do unto your computer as you hope it will do > unto you when they run the world. :) ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
Put yourself in the computer's shoes, and also clarify what you need to accomplish. You are asking it to sort the entire list of 2000 records, but (if I understand) you only want a handful of those. And it has already gone through all the records once before the sort. If you asked a human to do that, it would be a risky interaction. There might be raised voices when they present the neatly arranged file cabinet with files in a special order, and you deliver the punch line that you only need to pull a few. If you only need to pull 5 records, most people don't want to go through all 2000 files - twice. So generally, don't use a sort after doing a complete loop on a big text list, use another method. Do unto your computer as you hope it will do unto you when they run the world. :) (But I would be interested in the speed of your original code on LC 9 vs LC 6.7, just to see how engine optimization is faring. I'm surprised it took a minute.) Best wishes, Curry Kenworthy Custom Software Development LiveCode Training and Consulting http://livecodeconsulting.com/ ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
I don't know if these are good ideas or BAD ideas and without some suitable data, not sure if I could find out - so I'll throw the ideas over and let you decide if they are worth trying :-) 1. Two ideas combined - avoid scanning for the "item 1" of each line - use the (hopefully optimised enough) numeric-index array lookups So, instead of put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \ after candidateList try put T,S into candidateArray[count] put random(101 + s1 * 30 + 5 * s2 - i2) into candidateValue[count] add 1 to count And then later put the keys of candidateValue into candidatelist sort numeric the lines of candidatelist by candidateValue of each and the access via the candidateArray. 2. (even wilder) Assume you have N entries to consider, and that you only need M results The *IF* M << N (i.e. much less than - you only need a few results from a large input set). You might try something like this for the inner loop : (going back to using lines ...) repeat for each line S in storyArray[T] put userSeenArray[uID][item 1 of S] into s1 put abs(item 2 of S - i1) into s2 if s2 < 20 and s1 < 4 then put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \ after candidateList add 1 to lineCount if lineCount > 4 * M then-- (why 4 ? - maybe 2, maybe 8 ??) sort candidateList numeric by item 1 of each delete line M+1 to -1 of candidateList put M into lineCount end if end if end repeat 3. even wilder still - create K different candidateLists (where K is some power of 2) - sort each one, - mergesort in pairs, stopping when you get to M ... put 8 into KK-- (why 8 ? - maybe 16, or 32 ...) repeat for each key T in interestArray[uID] put item 1 of interestArray[uID][T] into i1 put item 2 of interestArray[uID][T] into i2 repeat for each line S in storyArray[T] put userSeenArray[uID][item 1 of S] into s1 put abs(item 2 of S - i1) into s2 if s2 < 20 and s1 < 4 then add 1 to setCount put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \ after candidateList[setCount] if setCount > KK then put 0 into setCount end if end if end repeat repeat with i = 1 to KK sort numeric lines of candidateList[i] by item 1 of each end repeat and then write / use a mergesort that takes two sorted lists, and returns the first N of the merged list. (or even better, returns the first N, with a limit value; then each mergesort after the first can use the max value from the N'th entry of earlier merges to allow for early exit...) That is left as an exercise for the reader :-) :-) As I say - could be totally crazy, could be worthwhile. I'd be happy to try out benchmarking it - if I had some representative data to play with. Alex. On 12/06/2018 12:03, hh via use-livecode wrote: You could try the following: repeat for each key T in interestArray[uID] put item 1 of interestArray[uID][T] into i1 put item 2 of interestArray[uID][T] into i2 repeat for each line S in storyArray[T] put userSeenArray[uID][item 1 of S] into s1 put abs(item 2 of S - i1) into s2 if s2 < 20 and s1 < 4 then put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \ after candidateList end if end repeat end repeat sort candidateList numeric by item 1 of each Ali wrote: repeat for each key T in interestArray[uID] Geoff wrote: repeat for each line T in the keys of interestArray[uID] repeat for each line S in storyArray[T] if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \ and userSeenArray[uID][item 1 of S] < 4 then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ abs(item 2 of S - item 1 of interestArray[uID][T]) - \ item 2 of interestArray[uID][T]),T,S & cr after candidateList end repeat end repeat sort lines of candidateList numeric by random(item 1 of each) ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
Sorry, I forgot to mention Jerry who already proposed to pull out computations from the inner loop and also to use variables in the inner loop. My experience says to avoid getting items as often as possible. And to use the random() in the inner repeat loop instead of in the final sort may be worth a trial with a large data set. I wrote: > You could try the following. > > repeat for each key T in interestArray[uID] > put item 1 of interestArray[uID][T] into i1 > put item 2 of interestArray[uID][T] into i2 > repeat for each line S in storyArray[T] > put userSeenArray[uID][item 1 of S] into s1 > put abs(item 2 of S - i1) into s2 > if s2 < 20 and s1 < 4 then > put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr after candidateList > end if > end repeat > end repeat > sort candidateList numeric by item 1 of each > > > Ali wrote: > repeat for each key T in interestArray[uID] > > Geoff wrote: > repeat for each line T in the keys of interestArray[uID] > repeat for each line S in storyArray[T] >if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \ > and userSeenArray[uID][item 1 of S] < 4 >then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ > abs(item 2 of S - item 1 of interestArray[uID][T]) - \ > item 2 of interestArray[uID][T]),T,S & cr after candidateList > end repeat > end repeat > sort lines of candidateList numeric by random(item 1 of each) ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
You could try the following: repeat for each key T in interestArray[uID] put item 1 of interestArray[uID][T] into i1 put item 2 of interestArray[uID][T] into i2 repeat for each line S in storyArray[T] put userSeenArray[uID][item 1 of S] into s1 put abs(item 2 of S - i1) into s2 if s2 < 20 and s1 < 4 then put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \ after candidateList end if end repeat end repeat sort candidateList numeric by item 1 of each > Ali wrote: > repeat for each key T in interestArray[uID] > >> Geoff wrote: >> repeat for each line T in the keys of interestArray[uID] >> repeat for each line S in storyArray[T] >> if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \ >> and userSeenArray[uID][item 1 of S] < 4 >> then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ >> abs(item 2 of S - item 1 of interestArray[uID][T]) - \ >> item 2 of interestArray[uID][T]),T,S & cr after candidateList >> end repeat >> end repeat >> sort lines of candidateList numeric by random(item 1 of each) ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
Hi Geoff, One thing to try in your original code, which should be significantly faster if the array is big, is using > repeat for each key T in interestArray[uID] instead of > repeat for each line T in the keys of interestArray[uID] The latter has to allocate memory for a string containing all the keys, AND iterate through the lines of that string, whereas repeating for each key directly accesses the keys of the array in hash order, and thus no extra allocation will occur and the iteration is faster than finding the next line delimiter each time. On Tue, Jun 12, 2018 at 8:35 AM Clarence Martin via use-livecode < use-livecode@lists.runrev.com> wrote: > I know that this might be a different use-case but I have a CA Lottery - > Fantasy five lottery parser that collects lottery data and loads it into a > matrix and provides the total number of hits for each number. This also > has 4 optimization buttons to the show differences. This aupplication was > optimized by our Pasadena LiveCode user's group. Drop me an email and I > will send you a zipped copy of the Script with a sample lotto data file > with approximately 8000 lines of picks. > Email: chi...@themartinz.com > > -Original Message- > From: use-livecode On Behalf Of > Geoff Canyon via use-livecode > Sent: Monday, June 11, 2018 5:22 PM > To: How to use LiveCode > Cc: Geoff Canyon > Subject: Optimization can be tricky > > I have a routine that takes about a minute to run for the test case I > created, and who knows how long for the real use case. Given that I want to > run it several thousand times for actual work, I need it to run (much) > faster. > > Roughly, the routine gathers together a bunch of data from arrays, sorts > the data based on its relationship to other arrays, and then returns a > subset of the result. > > My first pass at the routine looked roughly like this: > >repeat for each line T in the keys of interestArray[uID] > repeat for each line S in storyArray[T] > if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \ >and userSeenArray[uID][item 1 of S] < 4 > then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ >abs(item 2 of S - item 1 of interestArray[uID][T]) - \ >item 2 of interestArray[uID][T]),T,S & cr after > candidateList > end repeat >end repeat >sort lines of candidateList numeric by random(item 1 of each) > > In simple terms: parse through the individual lines of all the entries > that possibly work, calculating a relevant value for each and appending > that value and the line to an interim list, which then gets sorted, > randomly favoring lower values. > > I assumed the problem was all the line-by-line parsing, and I thought of a > clever way to accomplish the same thing. That led to this: > >put storyArray into R >intersect R with interestArray[uID] >combine R using cr and comma >sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 * > \ > abs(item 3 of each - item 1 of interestArray[uID][item 1 of > each]) \ > - item 2 of interestArray[uID][item 1 of each]) > > Much simpler, albeit that's a heck of a "sort by" -- more complex by far > than any I had previously created, and a testament to the power and > flexibility of "each". Alas, despite condensing my code and removing > parsing and loops, that version took ten seconds more than the previous > version, I'm guessing because the previous version has that "if" statement > that weeds out undesirable entries before the sort has to deal with them. > > (I'm writing this email as I parse through this figuring it out) > > So it turns out that the crazy use of "each" is only part of the problem > -- changing that line to: > >sort lines of R by random(1) > > still takes over 20 seconds -- 3x as fast, but still far too slow. It > turns out that the source data numbers anywhere from 1,000 to 2,000 lines, > so sorting it in any way to randomly select the several lines I need is a > really bad choice. removing the sort step but keeping everything else cuts > the execution time down to under a second. > > Hmm. How to select several lines at weighted random from among a couple > thousand? I'll think on this and post a follow-up. > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your
RE: Optimization can be tricky
I know that this might be a different use-case but I have a CA Lottery - Fantasy five lottery parser that collects lottery data and loads it into a matrix and provides the total number of hits for each number. This also has 4 optimization buttons to the show differences. This aupplication was optimized by our Pasadena LiveCode user's group. Drop me an email and I will send you a zipped copy of the Script with a sample lotto data file with approximately 8000 lines of picks. Email: chi...@themartinz.com -Original Message- From: use-livecode On Behalf Of Geoff Canyon via use-livecode Sent: Monday, June 11, 2018 5:22 PM To: How to use LiveCode Cc: Geoff Canyon Subject: Optimization can be tricky I have a routine that takes about a minute to run for the test case I created, and who knows how long for the real use case. Given that I want to run it several thousand times for actual work, I need it to run (much) faster. Roughly, the routine gathers together a bunch of data from arrays, sorts the data based on its relationship to other arrays, and then returns a subset of the result. My first pass at the routine looked roughly like this: repeat for each line T in the keys of interestArray[uID] repeat for each line S in storyArray[T] if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \ and userSeenArray[uID][item 1 of S] < 4 then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ abs(item 2 of S - item 1 of interestArray[uID][T]) - \ item 2 of interestArray[uID][T]),T,S & cr after candidateList end repeat end repeat sort lines of candidateList numeric by random(item 1 of each) In simple terms: parse through the individual lines of all the entries that possibly work, calculating a relevant value for each and appending that value and the line to an interim list, which then gets sorted, randomly favoring lower values. I assumed the problem was all the line-by-line parsing, and I thought of a clever way to accomplish the same thing. That led to this: put storyArray into R intersect R with interestArray[uID] combine R using cr and comma sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 * \ abs(item 3 of each - item 1 of interestArray[uID][item 1 of each]) \ - item 2 of interestArray[uID][item 1 of each]) Much simpler, albeit that's a heck of a "sort by" -- more complex by far than any I had previously created, and a testament to the power and flexibility of "each". Alas, despite condensing my code and removing parsing and loops, that version took ten seconds more than the previous version, I'm guessing because the previous version has that "if" statement that weeds out undesirable entries before the sort has to deal with them. (I'm writing this email as I parse through this figuring it out) So it turns out that the crazy use of "each" is only part of the problem -- changing that line to: sort lines of R by random(1) still takes over 20 seconds -- 3x as fast, but still far too slow. It turns out that the source data numbers anywhere from 1,000 to 2,000 lines, so sorting it in any way to randomly select the several lines I need is a really bad choice. removing the sort step but keeping everything else cuts the execution time down to under a second. Hmm. How to select several lines at weighted random from among a couple thousand? I'll think on this and post a follow-up. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
Do you need the entire list sorted randomly or are you just needing to select N random entries from the list (weighted)? How does the speed change if you do a simple numeric sort? You could use a function on a random number to weight things - would need to work out the right math to get what you want. Something like random(n)^2/n^2 possibly. Depending on how heavily you want to prefer one end, you could use log or ln. Your initial code is calculating “item 1 of interestArray[uID][T]” once or twice for every line S. I’m guessing that if you evaluate that to a variable once per repeat it would provide a small gain. Depending on how often the if turns out to be true, some additional variables could possibly reduce the number of duplicate calculations enough to offset the cost of the assignment (what Jerry mentioned). On Jun 11, 2018, 8:30 PM -0500, Jerry Jensen via use-livecode , wrote: > At first glance, it looks like you might save some time by grabbing > interestArray[uID][T] as soon as you have T, and then use what you grabbed in > the 3 later places instead of re-figuring interestArray[uID][T] each time. > > As in: > repeat for each line T in the keys of interestArray[uID] > put interestArray[uID][T] into AuT —- grab it here > repeat for each line S in storyArray[T] > if abs(item 2 of S - item 1 of AuT) < 20 \ > and userSeenArray[uID][item 1 of S] < 4 > then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ > abs(item 2 of S - item 1 of AuT) - \ > item 2 of AuT),T,S & cr after candidateList > end repeat > end repeat > sort lines of candidateList numeric by random(item 1 of each) > > It looks like you could do a similar thing with userSeenArray[uID][item 1 of > S] to avoid repeating the same calculation. > Also with item 1 of AuT, and item 2 of S, with lesser gains but while you’re > at it… > That all will make it easier (or harder) to read, depending on if you can > find descriptive variable names for the intermediate values. > > It won’t help the sort speed, but saves a lot of array un-hashing. > I haven’t figured out what the algoraithm is doing, or the idea of the > randomness in the sort. > > All untested, of course! > Cheers, > Jerry > > > On Jun 11, 2018, at 5:21 PM, Geoff Canyon via use-livecode > > wrote: > > > > My first pass at the routine looked roughly like this: > > > > repeat for each line T in the keys of interestArray[uID] > > repeat for each line S in storyArray[T] > > if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \ > > and userSeenArray[uID][item 1 of S] < 4 > > then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ > > abs(item 2 of S - item 1 of interestArray[uID][T]) - \ > > item 2 of interestArray[uID][T]),T,S & cr after candidateList > > end repeat > > end repeat > > sort lines of candidateList numeric by random(item 1 of each) > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
At first glance, it looks like you might save some time by grabbing interestArray[uID][T] as soon as you have T, and then use what you grabbed in the 3 later places instead of re-figuring interestArray[uID][T] each time. As in: repeat for each line T in the keys of interestArray[uID] put interestArray[uID][T] into AuT —- grab it here repeat for each line S in storyArray[T] if abs(item 2 of S - item 1 of AuT) < 20 \ and userSeenArray[uID][item 1 of S] < 4 then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ abs(item 2 of S - item 1 of AuT) - \ item 2 of AuT),T,S & cr after candidateList end repeat end repeat sort lines of candidateList numeric by random(item 1 of each) It looks like you could do a similar thing with userSeenArray[uID][item 1 of S] to avoid repeating the same calculation. Also with item 1 of AuT, and item 2 of S, with lesser gains but while you’re at it… That all will make it easier (or harder) to read, depending on if you can find descriptive variable names for the intermediate values. It won’t help the sort speed, but saves a lot of array un-hashing. I haven’t figured out what the algoraithm is doing, or the idea of the randomness in the sort. All untested, of course! Cheers, Jerry > On Jun 11, 2018, at 5:21 PM, Geoff Canyon via use-livecode > wrote: > > My first pass at the routine looked roughly like this: > > repeat for each line T in the keys of interestArray[uID] > repeat for each line S in storyArray[T] > if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \ > and userSeenArray[uID][item 1 of S] < 4 > then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ > abs(item 2 of S - item 1 of interestArray[uID][T]) - \ > item 2 of interestArray[uID][T]),T,S & cr after candidateList > end repeat > end repeat > sort lines of candidateList numeric by random(item 1 of each) ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Optimization can be tricky
please do goeff this subject is very interesting to me. i have some problems where i need to optimize in a similar way. which kind of repeat look have u found works fastest? have u tried ? repeat for each key this_key in array? is that slower? i love saving milliseconds. :) makes a big diff at scale. On Mon, Jun 11, 2018 at 8:21 PM, Geoff Canyon via use-livecode < use-livecode@lists.runrev.com> wrote: > I have a routine that takes about a minute to run for the test case I > created, and who knows how long for the real use case. Given that I want to > run it several thousand times for actual work, I need it to run (much) > faster. > > Roughly, the routine gathers together a bunch of data from arrays, sorts > the data based on its relationship to other arrays, and then returns a > subset of the result. > > My first pass at the routine looked roughly like this: > >repeat for each line T in the keys of interestArray[uID] > repeat for each line S in storyArray[T] > if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \ >and userSeenArray[uID][item 1 of S] < 4 > then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \ >abs(item 2 of S - item 1 of interestArray[uID][T]) - \ >item 2 of interestArray[uID][T]),T,S & cr after > candidateList > end repeat >end repeat >sort lines of candidateList numeric by random(item 1 of each) > > In simple terms: parse through the individual lines of all the entries that > possibly work, calculating a relevant value for each and appending that > value and the line to an interim list, which then gets sorted, randomly > favoring lower values. > > I assumed the problem was all the line-by-line parsing, and I thought of a > clever way to accomplish the same thing. That led to this: > >put storyArray into R >intersect R with interestArray[uID] >combine R using cr and comma >sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 * > \ > abs(item 3 of each - item 1 of interestArray[uID][item 1 of each]) > \ > - item 2 of interestArray[uID][item 1 of each]) > > Much simpler, albeit that's a heck of a "sort by" -- more complex by far > than any I had previously created, and a testament to the power and > flexibility of "each". Alas, despite condensing my code and removing > parsing and loops, that version took ten seconds more than the previous > version, I'm guessing because the previous version has that "if" statement > that weeds out undesirable entries before the sort has to deal with them. > > (I'm writing this email as I parse through this figuring it out) > > So it turns out that the crazy use of "each" is only part of the problem -- > changing that line to: > >sort lines of R by random(1) > > still takes over 20 seconds -- 3x as fast, but still far too slow. It turns > out that the source data numbers anywhere from 1,000 to 2,000 lines, so > sorting it in any way to randomly select the several lines I need is a > really bad choice. removing the sort step but keeping everything else cuts > the execution time down to under a second. > > Hmm. How to select several lines at weighted random from among a couple > thousand? I'll think on this and post a follow-up. > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode