Re: a program to delete duplicate files

David Eppstein Mon, 14 Mar 2005 22:25:07 -0800

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (John J. Lee) 
wrote:


> > If you read them in parallel, it's _at most_ m (m is the worst case
> > here), not 2(m-1). In my tests, it has always significantly less than
> > m.
> 
> Hmm, Patrick's right, David, isn't he?

Yes, I was only considering pairwise comparisons. As he says, 
simultaneously comparing all files in a group would avoid repeated reads 
without the CPU overhead of a strong hash.  Assuming you use a system 
that allows you to have enough files open at once...

> And I'm not sure what the trade off between disk seeks and disk reads
> does to the problem, in practice (with caching and realistic memory
> constraints).

Another interesting point.

-- 
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: a program to delete duplicate files

Reply via email to