Re: Finding duplicates in a list

Eric Chatonet Wed, 09 Jan 2008 04:19:03 -0800

Hi Ian,

I'm sorry, I did not see that my email went out without being finished:

Second part about a solution using arrays is missing but it does notmatter because you got Bill's answer.

Mine was almost the same :-)


Best regards from Paris,
Eric Chatonet.

Le 9 janv. 08 à 12:27, Eric Chatonet a écrit :

Hi Ian,

I just tried a simple repeat for each:

function Dups pList
  local tList2,tList3,tTimer,tStart
  -----
  ShowProgress 0,the number of lines of pList --
  put the milliseconds into tStart
  put 0 into tTimer
  repeat for each line tLine in pList
    if tTimer mod 100 = 0 then ShowProgress tTimer --
    add 1 to tTimer
    if tLine is not in tList2 then put tLine & cr after tList2
    else put tLine & cr after tList3
  end repeat
  ShowProgress 0 --
return the milliseconds - tStart && "ms" & cr & the number oflines of pList & cr & the number of lines of tList3 & cr & tList3
end Dups
-------------------------------
on ShowProgress pPos,pEnd
  set the thumbpos of sb "Progress" to pPos
  if pEnd <> empty then set the endvalue of sb "Progress" to pEnd
end ShowProgress
This ran in about 5 seconds on my Vista machine using your list andreturned 686 duplicates among 8708 references.The problem with such a method is that it is slowing down as thecheck progresses because tList2 is growing :-(
I tried to imagine another solution using arrays

Best regards from Paris,
Eric Chatonet.

Le 9 janv. 08 à 06:44, Ian Wood a écrit :
The problem - trying to find duplicate files in a database (AppleAperture), and have found a checksum column for all the image files.
I've had a go at writing a handler to find the dupes and it doesOK, but wondered if the bright sparks on the list have any adviceon speeding it up it...
The handler:

====================

put the milliseconds into tt
put ijwAPLIB_getAllChecksums() into tList -- this returns thelist of checksums, 10k in my sample BD, over 40k in the 'real' DB
  put number of lines of tList into tNumLines
  sort tlist
  put 0 into x
  repeat tNumLines times
    add 1 to x
if last char of x is 1 then set the cursor to busy --removing this speeds it up by roughly 10%
    put line x of tList into tCheck
    if tCheck is empty then next repeat
    put x + 1 into y
    repeat (tNumLines - x) times
      put line y of tList into tOther
      if tCheck is tOther then
        put x & tab & y & tab & tCheck & return after tRet
      else
        put y into x
        exit repeat
      end if
      add 1 to y
    end repeat
  end repeat
put the milliseconds - tt & return & "number of files:" &&tNumLines & return & return & tRet
====================

Sample results:

9804
number of files: 8708

116     117     027351c1bed597af774536af8e982363
119     120     0292d175c04d790f50246a5ee043a599
162     163     03d6313ee21a91ed0b0343f339c583e4
185     186     046ddab379a8f44955f1d5605c294605
230     231     05a77db5e76eb02f8d439e13286d3620
245     246     065474aa9bba7e2f24c7435863f5f2ff
314     315     0884f4b24b5bd99ddefdb100fde58a31
333     334     0918ce2135933d6c8f0ee2860837b5f9
360     361     0a2525bef1a46a329b7e902981ef94e2
360     362     0a2525bef1a46a329b7e902981ef94e2
360     363     0a2525bef1a46a329b7e902981ef94e2
360     364     0a2525bef1a46a329b7e902981ef94e2

Ian


Best regards from Paris,
Eric Chatonet.
----------------------------------------------------------------
Plugins and tutorials for Revolution: http://www.sosmartsoftware.com/
Email: [EMAIL PROTECTED]/
----------------------------------------------------------------


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Finding duplicates in a list

Reply via email to