Hi Ian,
I'm sorry, I did not see that my email went out without being finished:
Second part about a solution using arrays is missing but it does not
matter because you got Bill's answer.
Mine was almost the same :-)
Best regards from Paris,
Eric Chatonet.
Le 9 janv. 08 à 12:27, Eric Chatonet a écrit :
Hi Ian,
I just tried a simple repeat for each:
function Dups pList
local tList2,tList3,tTimer,tStart
-----
ShowProgress 0,the number of lines of pList --
put the milliseconds into tStart
put 0 into tTimer
repeat for each line tLine in pList
if tTimer mod 100 = 0 then ShowProgress tTimer --
add 1 to tTimer
if tLine is not in tList2 then put tLine & cr after tList2
else put tLine & cr after tList3
end repeat
ShowProgress 0 --
return the milliseconds - tStart && "ms" & cr & the number of
lines of pList & cr & the number of lines of tList3 & cr & tList3
end Dups
-------------------------------
on ShowProgress pPos,pEnd
set the thumbpos of sb "Progress" to pPos
if pEnd <> empty then set the endvalue of sb "Progress" to pEnd
end ShowProgress
This ran in about 5 seconds on my Vista machine using your list and
returned 686 duplicates among 8708 references.
The problem with such a method is that it is slowing down as the
check progresses because tList2 is growing :-(
I tried to imagine another solution using arrays
Best regards from Paris,
Eric Chatonet.
Le 9 janv. 08 à 06:44, Ian Wood a écrit :
The problem - trying to find duplicate files in a database (Apple
Aperture), and have found a checksum column for all the image files.
I've had a go at writing a handler to find the dupes and it does
OK, but wondered if the bright sparks on the list have any advice
on speeding it up it...
The handler:
====================
put the milliseconds into tt
put ijwAPLIB_getAllChecksums() into tList -- this returns the
list of checksums, 10k in my sample BD, over 40k in the 'real' DB
put number of lines of tList into tNumLines
sort tlist
put 0 into x
repeat tNumLines times
add 1 to x
if last char of x is 1 then set the cursor to busy --
removing this speeds it up by roughly 10%
put line x of tList into tCheck
if tCheck is empty then next repeat
put x + 1 into y
repeat (tNumLines - x) times
put line y of tList into tOther
if tCheck is tOther then
put x & tab & y & tab & tCheck & return after tRet
else
put y into x
exit repeat
end if
add 1 to y
end repeat
end repeat
put the milliseconds - tt & return & "number of files:" &&
tNumLines & return & return & tRet
====================
Sample results:
9804
number of files: 8708
116 117 027351c1bed597af774536af8e982363
119 120 0292d175c04d790f50246a5ee043a599
162 163 03d6313ee21a91ed0b0343f339c583e4
185 186 046ddab379a8f44955f1d5605c294605
230 231 05a77db5e76eb02f8d439e13286d3620
245 246 065474aa9bba7e2f24c7435863f5f2ff
314 315 0884f4b24b5bd99ddefdb100fde58a31
333 334 0918ce2135933d6c8f0ee2860837b5f9
360 361 0a2525bef1a46a329b7e902981ef94e2
360 362 0a2525bef1a46a329b7e902981ef94e2
360 363 0a2525bef1a46a329b7e902981ef94e2
360 364 0a2525bef1a46a329b7e902981ef94e2
Ian
Best regards from Paris,
Eric Chatonet.
----------------------------------------------------------------
Plugins and tutorials for Revolution: http://www.sosmartsoftware.com/
Email: [EMAIL PROTECTED]/
----------------------------------------------------------------
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution