Oops, you are correct, the uniq command should have -w 34 list.of.files not -w list.of.files.  Sorry!  (here's what I'd typed and what I should have cut/pasted:

root@rusty-MS-7851:/backups1/backup_system_v2# uniq -c -d -w 34 sorted.new_filesA.md5|less ; wc -l sorted.new_filesA.md5
42279 sorted.new_filesA.md5
root@rusty-MS-7851:/backups1/backup_system_v2# uniq -c -d -w 34 sorted.new_filesA.md5

sorry again!)


Also, if you want to get a list of files and their MD5 sums from 'higher up' in the directory tree, just change the starting directory in your find command to that higher up location. However, you might need to run the entire find and md5sum sequence as root, if the directories (and files) you care about don't have read permission for you.  (so, to find ALL files everywhere on your computer, change the ~ to /. You'll certainly get lots of permission denied errors if you do that as yourself and not root. But starting at / will traverse ALL directories on your computer, including /dev, and others you probably don't care about.  There are some useful options to find (like, don't go to a different filesystem) you might want to use, see man page for find to find them ;-)

On 9/30/24 07:05, Michael via PLUG-discuss wrote:
thank you so much! After running it I find it only finds the duplicates in
~. I need to find the duplicates across all the directories under home.
after looking at the man file and searching for recu it seems it recurses
by default unless I am reading it wrong.
I tried the uniq command but:

  uniq -c -d -w list.of.files
  uniq: list.of.files: invalid number of bytes to compare

isn't uniq used to find the differences between two files? I have
a very rudimentary understanding of linux so I'm sure I'm wrong

all the files in list.of.files are invisible files. (prefaced with a
period))
and isn't there a way to sort things depending on their column (column1
md5sum, column2 file name)

On Mon, Sep 30, 2024 at 2:56 AM Rusty Carruth via PLUG-discuss <
[email protected]> wrote:

On 9/28/24 21:06, Michael via PLUG-discuss wrote:
About a year ago I messed up by accidently copying a folder  with other
folders into another folder. I'm running out of room and need to find
that
directory tree and get rid of it. All I know for certain is that it is
somewhere in my home directory. I THINK it is my pictures directory with
ARW files.
chatgpt told me to use fdupes but it told me to use an exclude option
(which I found out it doesn't have) to avoid config files (and I was
planning on adding to that as I discovered other stuff I didn't want).
then
it told me to use find but I got an error which leads me to believe it
doesn't know what it's talking about!
coul;d someone help me out?

First, someone said you need to run updatedb before running find.  No,
sorry, updatedb is for using locate, not find.  Find actively walks the
directory tree.  Locate searches the text (I think) database built by
updatedb.


Ok, now to answer the question.  I've got a similar situation, but in
spades.  Every time I did a backup, I did an entire copy of everything,
so I've got ... oh, 10, 20, 30 copies of many things. I'm working on
scripts to help reduce that, but for now doing it somewhat manually, I
suggest the following command:


cd (the directory of interest, possibly your home dir) ; find . -type f
-print0 | xargs -0 md5sum | sort > list.of.files

this will create a list of files, sorted by their md5sum.  If you want
to be lazy and not search that file for duplicate md5sums, consider
uniq.  Like this:

uniq -c -d -w list.of.files


This will print the list of files which are duplicates.  For example,
out of a list of 42,279 files in a certain directory on my computer,
here's the result:

        2 73d249df037f6e63022e5cfa8d0c959b

_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160321-223138.png
        5 9b162ac35214691461cc0f0104fb91ce
_files/melissa/Documents/EPHESUS/Office Stuff/SPD/SPD SUMMER 2016 (1).pdf
        3 b396af67f2cd75658397efd878a01fb8
_files/dads_zipdisks/2003-1/CLASS at VBC Sp-03/CLASS BKUP - Music
Reading & Sight Singing Class/C  & D Major & Minor Scales & Chords.mct
        2 cd83094e0c4aeb9128806b5168444578

_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160318-222051.png
        2 d1a5a1bec046cc85a3a3fd53a8d5be86

_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160410-145331.png
        2 fa681c54a2bd7cfa590ddb8cf6ca1cea

_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160312-113340.png

Originally the _files directory had MANY duplicates, now I've managed to
get that down to the above list...

Anyway, there you go.  Happy scripting.

---------------------------------------------------
PLUG-discuss mailing list: [email protected]
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss



---------------------------------------------------
PLUG-discuss mailing list: [email protected]
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss
---------------------------------------------------
PLUG-discuss mailing list: [email protected]
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss

Reply via email to