On 06/10/2011 22:19, Pete wrote:
Thanks for the report back on the speed Alex.  I guess its academic if the
speed is down to 100msecs but I'm wondering if a binary search technique
would be better or worse (assuming the lists were sorted of course).

How did you create the two lists for your test?  I'd like to try the binary
search but stuck with an easy way to generate two large files like that!

Pete
Molly's Revenge<http://www.mollysrevenge.com>



Pretty basic, I have to admit.

Fill a variable with a large number of characters (all lower case letters).
Insert a CR every 100-300 characters within it.
Then (later) copy  out however many lines you want to use ....

(and to make different lines to test the 'difference' part of it, I inserted uppercase 'A's at random places :-)

(see code below)..

btw - binary search won't help (or at least, not very much). To use binary search, you need the lines in order. If you have the lines in order, then you don't need binary search, you can use a simple step-by-step comparison (being very careful about lines which are duplicaed in each file !!).

And in any case, to get the lines in order you need to sort them, so that will be O(n log n) at best. That makes it no better than, and perhaps not as good as, the array technique, since hash-table lookup is likely to be somewhere between constant and 'log n' in time, so the whole operation is somewhere between linear and O(n log n) anyway.


As an aside, in case you do ever want to use binary search (or the step-by-step comparison) remember you cannot do it efficiently using "line X of tData" because that requires scanning to count lines. You need to (rather painfully) use char offsets (since they can be accessed in constant time).

-- Alex.
global gData
on mouseUp
   local ka, tStartTime, t

   put empty into gData
   put the millisecs into tStartTime
   put chartonum("a")-1 into ka
   constant K=5000000, K1=10000
   repeat K times
      put numtochar(random(26) + ka) after gData
   end repeat
   put 1 into t
   repeat until t > K
      add (100+random(200)) to t
      put CR into char t of gData
   end repeat
put the number of lines in gData && the millisecs-tStartTime & CR into field "F"
   put gData after field "F"
end mouseUp


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to