Oops, you are correct, the uniq command should have -w 34 list.of.files
not -w list.of.files. Sorry! (here's what I'd typed and what I should
have cut/pasted:
root@rusty-MS-7851:/backups1/backup_system_v2# uniq -c -d -w 34
sorted.new_filesA.md5|less ; wc -l sorted.new_filesA.md5
42279 sorted.new_filesA.md5
root@rusty-MS-7851:/backups1/backup_system_v2# uniq -c -d -w 34
sorted.new_filesA.md5
sorry again!)
Also, if you want to get a list of files and their MD5 sums from 'higher
up' in the directory tree, just change the starting directory in your
find command to that higher up location. However, you might need to run
the entire find and md5sum sequence as root, if the directories (and
files) you care about don't have read permission for you. (so, to find
ALL files everywhere on your computer, change the ~ to /. You'll
certainly get lots of permission denied errors if you do that as
yourself and not root. But starting at / will traverse ALL directories
on your computer, including /dev, and others you probably don't care
about. There are some useful options to find (like, don't go to a
different filesystem) you might want to use, see man page for find to
find them ;-)
On 9/30/24 07:05, Michael via PLUG-discuss wrote:
thank you so much! After running it I find it only finds the duplicates in
~. I need to find the duplicates across all the directories under home.
after looking at the man file and searching for recu it seems it recurses
by default unless I am reading it wrong.
I tried the uniq command but:
uniq -c -d -w list.of.files
uniq: list.of.files: invalid number of bytes to compare
isn't uniq used to find the differences between two files? I have
a very rudimentary understanding of linux so I'm sure I'm wrong
all the files in list.of.files are invisible files. (prefaced with a
period))
and isn't there a way to sort things depending on their column (column1
md5sum, column2 file name)
On Mon, Sep 30, 2024 at 2:56 AM Rusty Carruth via PLUG-discuss <
[email protected]> wrote:
On 9/28/24 21:06, Michael via PLUG-discuss wrote:
About a year ago I messed up by accidently copying a folder with other
folders into another folder. I'm running out of room and need to find
that
directory tree and get rid of it. All I know for certain is that it is
somewhere in my home directory. I THINK it is my pictures directory with
ARW files.
chatgpt told me to use fdupes but it told me to use an exclude option
(which I found out it doesn't have) to avoid config files (and I was
planning on adding to that as I discovered other stuff I didn't want).
then
it told me to use find but I got an error which leads me to believe it
doesn't know what it's talking about!
coul;d someone help me out?
First, someone said you need to run updatedb before running find. No,
sorry, updatedb is for using locate, not find. Find actively walks the
directory tree. Locate searches the text (I think) database built by
updatedb.
Ok, now to answer the question. I've got a similar situation, but in
spades. Every time I did a backup, I did an entire copy of everything,
so I've got ... oh, 10, 20, 30 copies of many things. I'm working on
scripts to help reduce that, but for now doing it somewhat manually, I
suggest the following command:
cd (the directory of interest, possibly your home dir) ; find . -type f
-print0 | xargs -0 md5sum | sort > list.of.files
this will create a list of files, sorted by their md5sum. If you want
to be lazy and not search that file for duplicate md5sums, consider
uniq. Like this:
uniq -c -d -w list.of.files
This will print the list of files which are duplicates. For example,
out of a list of 42,279 files in a certain directory on my computer,
here's the result:
2 73d249df037f6e63022e5cfa8d0c959b
_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160321-223138.png
5 9b162ac35214691461cc0f0104fb91ce
_files/melissa/Documents/EPHESUS/Office Stuff/SPD/SPD SUMMER 2016 (1).pdf
3 b396af67f2cd75658397efd878a01fb8
_files/dads_zipdisks/2003-1/CLASS at VBC Sp-03/CLASS BKUP - Music
Reading & Sight Singing Class/C & D Major & Minor Scales & Chords.mct
2 cd83094e0c4aeb9128806b5168444578
_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160318-222051.png
2 d1a5a1bec046cc85a3a3fd53a8d5be86
_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160410-145331.png
2 fa681c54a2bd7cfa590ddb8cf6ca1cea
_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160312-113340.png
Originally the _files directory had MANY duplicates, now I've managed to
get that down to the above list...
Anyway, there you go. Happy scripting.
---------------------------------------------------
PLUG-discuss mailing list: [email protected]
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss
---------------------------------------------------
PLUG-discuss mailing list: [email protected]
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss
---------------------------------------------------
PLUG-discuss mailing list: [email protected]
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss