Re:[perl-python] a program to delete duplicate files

2005-03-20 Thread Xah Lee
Sorry i've been busy... Here's the Perl code. I have yet to clean up the code and make it compatible with the cleaned spec above. The code as it is performs the same algorithm as the spec, just doesn't print the output as such. In a few days, i'll post a clean version, and also a Python version, a

Re: [perl-python] a program to delete duplicate files

2005-03-20 Thread Claudio Grondi
>> I'll post my version in a few days. Have I missed something? Where can I see your version? Claudio "Xah Lee" <[EMAIL PROTECTED]> schrieb im Newsbeitrag news:[EMAIL PROTECTED] > here's a large exercise that uses what we built before. > > suppose you have tens of thousands of files in various d

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread Bengt Richter
On Fri, 11 Mar 2005 14:06:27 -0800, David Eppstein <[EMAIL PROTECTED]> wrote: >In article <[EMAIL PROTECTED]>, > Patrick Useldinger <[EMAIL PROTECTED]> wrote: > >> > Well, but the spec didn't say efficiency was the primary criterion, it >> > said minimizing the number of comparisons was. >> >> T

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread TZOTZIOY
On Fri, 11 Mar 2005 11:07:02 -0800, rumours say that David Eppstein <[EMAIL PROTECTED]> might have written: >More seriously, the best I can think of that doesn't use a strong slow >hash would be to group files by (file size, cheap hash) then compare >each file in a group with a representative of

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread David Eppstein
In article <[EMAIL PROTECTED]>, Patrick Useldinger <[EMAIL PROTECTED]> wrote: > > Well, but the spec didn't say efficiency was the primary criterion, it > > said minimizing the number of comparisons was. > > That's exactly what my program does. If you're doing any comparisons at all, you're no

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread Patrick Useldinger
David Eppstein wrote: Well, but the spec didn't say efficiency was the primary criterion, it said minimizing the number of comparisons was. That's exactly what my program does. More seriously, the best I can think of that doesn't use a strong slow hash would be to group files by (file size, cheap

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread Terry Hancock
On Thursday 10 March 2005 11:02 am, Christos "TZOTZIOY" Georgiou wrote: > On Wed, 9 Mar 2005 16:13:20 -0600, rumours say that Terry Hancock > <[EMAIL PROTECTED]> might have written: > > >For anyone interested in responding to the above, a starting > >place might be this maintenance script I wrote

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread David Eppstein
In article <[EMAIL PROTECTED]>, Patrick Useldinger <[EMAIL PROTECTED]> wrote: > > You need do no comparisons between files. Just use a sufficiently > > strong hash algorithm (SHA-256 maybe?) and compare the hashes. > > That's not very efficient. IMO, it only makes sense in network-based > ope

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread Patrick Useldinger
Christos TZOTZIOY Georgiou wrote: A minor nit-pick: `fdups.py -r .` does nothing (at least on Linux). Changed. -- http://mail.python.org/mailman/listinfo/python-list

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread Patrick Useldinger
David Eppstein wrote: You need do no comparisons between files. Just use a sufficiently strong hash algorithm (SHA-256 maybe?) and compare the hashes. That's not very efficient. IMO, it only makes sense in network-based operations such as rsync. -pu -- http://mail.python.org/mailman/listinfo/py

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread Patrick Useldinger
Christos TZOTZIOY Georgiou wrote: The relevant parts from this last page: st_dev <-> dwVolumeSerialNumber st_ino <-> (nFileIndexHigh, nFileIndexLow) I see. But if I am not mistaken, that would mean that I (1) had to detect NTFS volumes (2) use non-standard libraries to find these information (like

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread TZOTZIOY
On Fri, 11 Mar 2005 01:12:14 +0100, rumours say that Patrick Useldinger <[EMAIL PROTECTED]> might have written: >> On POSIX filesystems, one has also to avoid comparing files having same >> (st_dev, >> st_inum), because you know that they are the same file. > >I then have a bug here - I consider

Re: [perl-python] a program to delete duplicate files

2005-03-11 Thread TZOTZIOY
On Fri, 11 Mar 2005 01:24:59 +0100, rumours say that Patrick Useldinger <[EMAIL PROTECTED]> might have written: >> Have you found any way to test if two files on NTFS are hard linked without >> opening them first to get a file handle? > >No. And even then, I wouldn't know how to find out. MSDN is

Re: [perl-python] a program to delete duplicate files

2005-03-10 Thread John Bokma
David Eppstein wrote: > In article <[EMAIL PROTECTED]>, > "Xah Lee" <[EMAIL PROTECTED]> wrote: > >> a absolute requirement in this problem is to minimize the number of >> comparison made between files. This is a part of the spec. > > You need do no comparisons between files. Just use a suffici

Re: [perl-python] a program to delete duplicate files

2005-03-10 Thread David Eppstein
In article <[EMAIL PROTECTED]>, "Xah Lee" <[EMAIL PROTECTED]> wrote: > a absolute requirement in this problem is to minimize the number of > comparison made between files. This is a part of the spec. You need do no comparisons between files. Just use a sufficiently strong hash algorithm (SHA-2

Re: [perl-python] a program to delete duplicate files

2005-03-10 Thread Patrick Useldinger
Christos TZOTZIOY Georgiou wrote: That's fast and good. Nice to hear. A minor nit-pick: `fdups.py -r .` does nothing (at least on Linux). I'll look into that. Have you found any way to test if two files on NTFS are hard linked without opening them first to get a file handle? No. And even then, I wo

Re: [perl-python] a program to delete duplicate files

2005-03-10 Thread Patrick Useldinger
Christos TZOTZIOY Georgiou wrote: On POSIX filesystems, one has also to avoid comparing files having same (st_dev, st_inum), because you know that they are the same file. I then have a bug here - I consider all files with the same inode equal, but according to what you say I need to consider the

Re: [perl-python] a program to delete duplicate files

2005-03-10 Thread TZOTZIOY
On Thu, 10 Mar 2005 10:54:05 +0100, rumours say that Patrick Useldinger <[EMAIL PROTECTED]> might have written: >I wrote something similar, have a look at >http://www.homepages.lu/pu/fdups.html. That's fast and good. A minor nit-pick: `fdups.py -r .` does nothing (at least on Linux). Have you

Re: [perl-python] a program to delete duplicate files

2005-03-10 Thread P
I've written a python GUI wrapper around some shell scripts: http://www.pixelbeat.org/fslint/ the shell script logic is essentially: exclude hard linked files only include files where there are more than 1 with the same size print files with matching md5sum Pádraig. -- http://mail.python.org/mailma

Re: [perl-python] a program to delete duplicate files

2005-03-10 Thread TZOTZIOY
On Wed, 9 Mar 2005 16:13:20 -0600, rumours say that Terry Hancock <[EMAIL PROTECTED]> might have written: >For anyone interested in responding to the above, a starting >place might be this maintenance script I wrote for my own use. I don't >think it exactly matches the spec, but it addresses the

Re: [perl-python] a program to delete duplicate files

2005-03-10 Thread Patrick Useldinger
I wrote something similar, have a look at http://www.homepages.lu/pu/fdups.html. -- http://mail.python.org/mailman/listinfo/python-list

Re: [perl-python] a program to delete duplicate files

2005-03-09 Thread Terry Hancock
On Wednesday 09 March 2005 06:56 am, Xah Lee wrote: > here's a large exercise that uses what we built before. > > suppose you have tens of thousands of files in various directories. > Some of these files are identical, but you don't know which ones are > identical with which. Write a program that

Re: [perl-python] a program to delete duplicate files

2005-03-09 Thread TZOTZIOY
On 9 Mar 2005 04:56:13 -0800, rumours say that "Xah Lee" <[EMAIL PROTECTED]> might have written: >Write a Perl or Python version of the program. > >a absolute requirement in this problem is to minimize the number of >comparison made between files. This is a part of the spec. http://groups-beta.g