>> I'll post my version in a few days. Have I missed something? Where can I see your version?
Claudio "Xah Lee" <[EMAIL PROTECTED]> schrieb im Newsbeitrag news:[EMAIL PROTECTED] > here's a large exercise that uses what we built before. > > suppose you have tens of thousands of files in various directories. > Some of these files are identical, but you don't know which ones are > identical with which. Write a program that prints out which file are > redundant copies. > > Here's the spec. > -------------------------- > The program is to be used on the command line. Its arguments are one or > more full paths of directories. > > perl del_dup.pl dir1 > > prints the full paths of all files in dir1 that are duplicate. > (including files in sub-directories) More specifically, if file A has > duplicates, A's full path will be printed on a line, immediately > followed the full paths of all other files that is a copy of A. These > duplicates's full paths will be prefixed with "rm " string. A empty > line follows a group of duplicates. > > Here's a sample output. > > inPath/a.jpg > rm inPath/b.jpg > rm inPath/3/a.jpg > rm inPath/hh/eu.jpg > > inPath/ou.jpg > rm inPath/23/a.jpg > rm inPath/hh33/eu.jpg > > order does not matter. (i.e. which file will not be "rm " does not > matter.) > > ------------------------ > > perl del_dup.pl dir1 dir2 > > will do the same as above, except that duplicates within dir1 or dir2 > themselves not considered. That is, all files in dir1 are compared to > all files in dir2. (including subdirectories) And, only files in dir2 > will have the "rm " prefix. > > One way to understand this is to imagine lots of image files in both > dir. One is certain that there are no duplicates within each dir > themselves. (imagine that del_dup.pl has run on each already) Files in > dir1 has already been categorized into sub directories by human. So > that when there are duplicates among dir1 and dir2, one wants the > version in dir2 to be deleted, leaving the organization in dir1 intact. > > perl del_dup.pl dir1 dir2 dir3 ... > > does the same as above, except files in later dir will have "rm " > first. So, if there are these identical files: > > dir2/a > dir2/b > dir4/c > dir4/d > > the c and d will both have "rm " prefix for sure. (which one has "rm " > in dir2 does not matter) Note, although dir2 doesn't compare files > inside itself, but duplicates still may be implicitly found by indirect > comparison. i.e. a==c, b==c, therefore a==b, even though a and b are > never compared. > > > -------------------------- > > Write a Perl or Python version of the program. > > a absolute requirement in this problem is to minimize the number of > comparison made between files. This is a part of the spec. > > feel free to write it however you want. I'll post my version in a few > days. > > http://www.xahlee.org/perl-python/python.html > > Xah > [EMAIL PROTECTED] > http://xahlee.org/PageTwo_dir/more.html > -- http://mail.python.org/mailman/listinfo/python-list