Re: any program that search for same files?

2018-10-16 Thread David Christensen

On 10/14/18 3:06 PM, Long Wind wrote:
given two directories, the program can print files that are in both 
directories


to make it easy, if file name and size are same, then they are same

i've to admit my memory is poor, if good, who need such program?

i'm about to write it in java, it can be completed in a few hours but
i think there might be simple solution



On 10/14/18 3:19 PM, The Wanderer wrote:

Sounds like a perfect case for rdfind.



On 10/14/18 3:39 PM, Håkon Alstadheim wrote:

fdupes or jdupes fit the bill.



On 10/14/18 6:11 PM, Erik Christiansen wrote:

What I use with full satisfaction is just "diff -qr dir1 dir2"...



On 10/15/18 8:14 AM, Andrew McGlashan wrote:

diff -s



On 10/16/18 12:53 AM, Long Wind wrote:
it seems all those packages compare contents of files but my 
definition of same file is same name and size and should run much 
fast


and they seem to search all directories for same files and this is 
different from my requirement


my ideal program should take two directories as argument and print 
files that are in both directories


i don't care same files within one directory

my ideal program can also take one directory as argument and print 
same file in it if more than two directories are given, then first 
directory is searched, all files that are also in other directories 
are printed


it seems i have to write it in java

is there users' interest for such program?



It sounds like you want a script to meet your specific needs.


It is my belief that writing a general-purpose file system metadata
search utility is a non-trivial task.  Take a look at find(1) to get an
idea of the scope of the most obvious related utility:

https://www.gnu.org/software/findutils/


Here is a Perl 5 script I wrote years ago to compare the modification 
times of same-named files in two directory trees:


https://sourceforge.net/projects/dirdiff/

It could be modified to meet your requirements.


I suggest that you start at the small end -- writing scripts to meet 
specific needs:


1.  Pick a scripting language.

2.  Define the requirements of the script.

3.  Design, construct, test, debug, and document the script.

4.  Integrate the script into your system administration processes.


Bonuses include:

1.  Use a version control system.

2.  Use an issue tracking system.


While you could use Java, understand that a Java runtime environment 
must be installed for Java apps to work.  sh, bash, perl, and/or python 
are supported OOTB by many OS distributions, which makes it easier to 
run your scripts inside fresh install, rescue, live, and other 
restricted environments.



David



Re: any program that search for same files?

2018-10-15 Thread Andrew McGlashan
Hi,

On 15/10/18 09:06, Long Wind wrote:
> given two directories, the program can print files that are in both
> directories
> 
> to make it easy, if file name and size are same, then they are same
> 
> i've to admit my memory is poor, if good, who need such program?
> 
> i'm about to write it in java, it can be completed in a few hours
> but i think there might be simple solution

diff -s

Cheers
A



Re: (solved) Re: any program that search for same files?

2018-10-15 Thread Andy Smith
Hello,

On Sun, Oct 14, 2018 at 10:37:10PM +, Long Wind wrote:
>  Thank The Wanderer!
> i've just installed rdfind, and i'll try it.

Also rmlint:

https://rmlint.readthedocs.io/en/latest/index.html

> > i'm about to write it in java, it can be completed in a few hours but
> > i think there might be simple solution

Depending on exactly what you want to achieve and what "duplicate"
really means to you, it can be more complex than you at first think:

https://rmlint.readthedocs.io/en/latest/cautions.html

Also if using btrfs or XFS and recent kernel you may want consider
duplicate finders that can make use of FIDEDUPERANGE ioctl:

http://man7.org/linux/man-pages/man2/ioctl_fideduperange.2.html

I know rmlint and duperemove can do this:

https://github.com/markfasheh/duperemove

Maybe others.

Cheers,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting



Re: any program that search for same files?

2018-10-14 Thread Erik Christiansen
On 14.10.18 22:06, Long Wind wrote:
> given two directories, the program can print files that are in both 
> directories
> 
> to make it easy, if file name and size are same, then they are same
> 
> i've to admit my memory is poor, if good, who need such program?
> 
> i'm about to write it in java, it can be completed in a few hours
> but i think there might be simple solution
>  

What I use with full satisfaction is just "diff -qr dir1 dir2", though
that does actually diff the files. So it suits well for double checking
a backup to flash stick. N.B. My paranoia began after one stick showed
corrupted bytes _without_ any change in file size. I.e. size is no guide.

My favourite was once the dircmp script, from AT, back in 1988. Google
shows others have sought it since, but I've not seen it on Linux.

Erik



Re: any program that search for same files?

2018-10-14 Thread The Wanderer
On 2018-10-14 at 18:56, Long Wind wrote:

> according to rdfind manual:
> 
> EXAMPLES   Search for duplicate files in home directory and a backup 
> directory:
>   rdfind ~ /mnt/backup
> 
> i run "rdfind /mnt/money/doc /mnt/play"
> to my alarm, it delete files:

Where do you get the idea that it was deleting files from?

> Now scanning "/mnt/money/doc", found 925 files.
> Now scanning "/mnt/play", found 30527 files.
> Now have 31452 files in total.
> Removed 0 files due to nonunique device and inode.
> Now removing files with zero size from list...removed 1 files

This means that it found a file with zero size, so it dropped that file
from the list. It doesn't mean that it deleted the file from the disk.

> Total size is 30245034596 bytes or 28 Gib
> Now sorting on size:removed 18727 files due to unique sizes from list.12724 
> files left.

This means that it dropped files from the list because there's no other
file in the list with the exact same file size. It doesn't mean that it
deleted any files from the disk.

> Now eliminating candidates based on first bytes:^C

This means that it's dropping files from the list because their first
few bytes are different. It doesn't mean that it's deleting files from
the disk.


I have used rdfind myself. It does not delete files unless you pass an
option telling it to.

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw



signature.asc
Description: OpenPGP digital signature


Re: any program that search for same files?

2018-10-14 Thread Håkon Alstadheim



Den 15. okt. 2018 00:19, skrev The Wanderer:
> On 2018-10-14 at 18:06, Long Wind wrote:
>
>> given two directories, the program can print files that are in both
>> directories
>>
>> to make it easy, if file name and size are same, then they are same
>>
>> i've to admit my memory is poor, if good, who need such program?
>>
>> i'm about to write it in java, it can be completed in a few hours but
>> i think there might be simple solution
> Sounds like a perfect case for rdfind.
>
fdupes or jdupes fit the bill.



Re: any program that search for same files?

2018-10-14 Thread The Wanderer
On 2018-10-14 at 18:06, Long Wind wrote:

> given two directories, the program can print files that are in both
> directories
> 
> to make it easy, if file name and size are same, then they are same
> 
> i've to admit my memory is poor, if good, who need such program?
> 
> i'm about to write it in java, it can be completed in a few hours but
> i think there might be simple solution

Sounds like a perfect case for rdfind.

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw



signature.asc
Description: OpenPGP digital signature