[ You guys probably all know this already, and I had just missed it up till now ....]

Any time speed of looping through a data structure is mentioned, the first thing anyone says is

Use

   repeat for each line tLine of tData
       doSomeThingWith  tLine

because it is much faster than

  repeat with tIndex = 1 to the number of lines in tData
       doSomeThingWith  line tIndex of tData


I just realized tonight that you can do

   put the number of lines in tData into tNumLines
   split tData by CR
   repeat with tIndex = 1 to tNumLines
       doSomeThingWith  tData[tIndex]

(and later combine it again if needed)

While this is nowhere near as fast as the "repeat for", it is better (by one or two orders of magnitude) than the "line tIndex of tData" version - fast enough that algorithms which I had rejected because of the slowness of "line tIndex of tData" can be worth considering again.

In particular - one obvious approach to the "common vs non-common lines" problem would be to sort each list, and then step through them "together", using an index into each one. This isn't a good approach in Rev - because of the high cost of using "line tIndex of tData" - but using the split and tData[tIndex] makes it feasible. In fact, I found that my first idea of using the two arrays and 'union' is no faster than doing the following (note "ZZZZZ is assumed to be after all possible lines, and is added to ease the loop termination case)

    put fld "Field" & cr & "ZZZZZZZZZZ" into t1
    put fld "Field" & cr & "test line" & cr  & "ZZZZZZZZZZ" into t2
put the millisecs into tStart
    put 1 into i2
    put the number of lines in t2 into limit2
sort t1

    sort t2
    split t2 by CR
    put t2[1] into L2
repeat for each line L1 in t1
        repeat while L2 < L1
            add 1 to i2
            put t2[i2] into L2
        end repeat
        if L2 = L1 then
            -- put L1 & cr after tBoth
            add 1 to i2
            put t2[i2] into L2
        else
            -- put L1 & cr after t1only
        end if
    end repeat
    if i2 < limit2 then
        repeat with i = i2 to limit2-1
            put t2[i] & cr after t2only
        end repeat
    end if
put "loop" && the millisecs - tStart & cr after msg


P.S. I tried hard to break every one of Jerry's recommendation about variable naming as described in his excellent tutorial from the "Conference" session; if you haven't already downloaded and read that stack, you should. It *might* just stop you from writing such ugly code as I did above - but my old Fortran habits just keep coming back :-)


--
Alex Tweedly       http://www.tweedly.net



--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.12.8/161 - Release Date: 03/11/2005

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to