"Jim Ault"  <JimAultWins at yahoo.com>  wrote:

One catch I can see is the "set whole matches to true"
also considering the false hits generated by your definition of a unique
line (lower case, sub string, number format)
 "Mary had a little lamb" = line 6 of field 2
"Mary had a little lamb,  whose fleece was white" = line 8 of field 1
line 6 of fld 2 is in line 8 of fld 1 => lineoffset would be > 0

"234" & "2345" == offset match, lineOffset not
"234" & "2,345" == offset match not,  lineOffset not
"234" & "2345.00 == offset match, lineOffset not
"234" & "2345, 554, 234, 196" == lineoffset match twice
"snow"  & "snow shovel" & "snowbound" & "snow-bound"

Jim Ault
Las Vegas

Good catch :-)

"Alex Tweedly"  <alex at tweedly.net >  wrote:

-snip-

>     put fld "Field" & cr & "ZZZZZZZZZZ" into t1
>     put fld "Field" & cr & "test line" & cr  & "ZZZZZZZZZZ" into t2
>
>     put the millisecs into tStart
>     put 1 into i2
>     put the number of lines in t2 into limit2
>
>     sort t1

>     sort t2
>     split t2 by CR
>     put t2[1] into L2
>
>     repeat for each line L1 in t1
>         repeat while L2 < L1
>             add 1 to i2
>             put t2[i2] into L2
>         end repeat
>         if L2 = L1 then
>             -- put L1 & cr after tBoth
>             add 1 to i2
>             put t2[i2] into L2
>         else
>             -- put L1 & cr after t1only
>         end if
>     end repeat
>     if i2 < limit2 then
>         repeat with i = i2 to limit2-1
>             put t2[i] & cr after t2only
>         end repeat
>     end if
>     put "loop" && the millisecs - tStart & cr after msg


P.S. I tried hard to break every one of Jerry's recommendation about
variable naming as described in his excellent tutorial from the
"Conference" session; if you haven't already downloaded and read that
stack, you should. It *might* just stop you from writing such ugly code
as I did above - but my old Fortran habits just keep coming back :-)


--
Alex Tweedly       http://www.tweedly.net

The handler above is not giving correct results, neither on numeric lists nor on word or mixed lists.

Follows a function which is a combination and adaptation of techniques mentioned previously in this thread

### adapt the names of handler and the filtermodes to own taste

function intersectSpecial pList1,pList2,pMode
  repeat for each line i in pList1
    add 1 to a[i]
  end repeat
  repeat for each line i in pList2
    add 2 to a[i]
  end repeat
  combine a with cr and tab
### elements only in pList1 --> 1
### elements only in pList2 --> 2
### elements in both lists     --> 3
  if pMode = "bothCommon" then put "*"&tab&"3" into tFilter
  else  if pMode = "uniqueA" then put "*"&tab&"1" into tFilter
  else if pMode = "uniqueB" then put "*"&tab&"2" into tFilter
else if pMode = "bothUnique" then put "*"&tab&"1,*" &tab&"2" into tFilter
  repeat for each item tFilterString in tFilter
    put a into b
    filter b with tFilterString
    replace char 2 to -1 of tFilterString with "" in b
    put b & cr after tList
  end repeat
  return tList
end intersectSpecial

on mouseUp
  put the millisecs into zap
  put intersectSpecial(fld 1,fld 2,"bothUnique") into fld 3
  put the millisecs - zap
end mouseUp

May be not a real speed monster but not bad either
(takes < 500 millisecs for 2 fields with > 25000 lines on an iMac G5 1.8 gHz)

Greetings,
Wouter
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to