Re: Searching a drive and copying files

2006-07-24 Thread Joshua Lewis
Many of the images will have the same name. I somehow managed to copy  
the same files several times when trying to do backup and restores.


At this point in time they are going from my Apple laptop to my  
FreeBSD server. I am going to start looking for a inexpensive Tape  
drive to back up my data.


I have been using iPhoto to manage my images.


Sincerely,
Joshua Lewis
[EMAIL PROTECTED]



On Jul 23, 2006, at 3:41 AM, Michael Hughes wrote:


Joshua,
  On the dups, will the names of the files be the same or different?

  Do you have plans on how you will be storing the images after you
get rid of the dups?  Have the images be edited and if they have, did
you edit them with a EXIF aware program?

  I use a program called epinfo to rename my images, it is part of the
photopc utility.  I have just a little over 10,000 digital images and
store them by year, month, day and time.  epinfo uses the EXIF data to
rename the files and set the time stamps for the files.  I have  
written

some php programs to allow me to display the images thru a web
browser.  It uses a MySQL database to manage the images into
categories.  I also store a checksum of the picture in the database
so I can check to see if the images have become damaged.  I have a
script that I wrote check the check sum in the database to the image.
I do backups whenever I add new images to the hard drive. This is  
still

a work in progress.

  If you can send me a little more data on your files and how you want
to store the images,  I could help you in you task.


On Sat, 22 Jul 2006 10:47:13 -0400
Joshua Lewis [EMAIL PROTECTED] wrote:


Hello List,

I have a two part question for anyone who may be able to help.

I need to search my drive for all pictures on my system and copy
them to a networked system using sftp or ssh or what not. There will
be duplicate names on the drive so I was hoping to have dups placed
in a separate folder. Due to my for lack of a better term stupidity
when I first got my camera I will probably have instances when there
will be three or four duplicates. If anyone can help me out with that
it would be great.

Second is there a resource online I can use to learn how to do my
own shell scripting?

My goal is to find all my pictures and compare them then delete the
dups that don't look that good. A daunting task as I have 20 GB of
data. I bet 10 GB are dups.

Thanks for any help.

Sincerely,
Joshua Lewis
[EMAIL PROTECTED]



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to
[EMAIL PROTECTED]





--
Michael Hughes  Log Home living is the best
[EMAIL PROTECTED]

Temperatures:
Outside: 60.6 House: 70.9 Computer room: 69.5



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Searching a drive and copying files

2006-07-23 Thread Parv
in message [EMAIL PROTECTED],
wrote Joshua Lewis thusly...

 I need to search my drive for all pictures on my system and copy
 them  to a networked system using sftp or ssh or what not. There
 will be  duplicate names on the drive so I was hoping to have dups
 placed in a  separate folder.

Unison, net/unison port, should be able to handle the duplicates
based on file checksum.  (I personally have not used it much, so i
cannot answer any other queried about it; refer to its fine man
page.)


 Due to my for lack of a better term stupidity when I  first got
 my camera I will probably have instances when there will be  three
 or four duplicates.  can help me out with that it  would be great.
...
 My goal is to find all my pictures and compare them then delete
 the  dups that don't look that good. A daunting task as I have 20
 GB of  data. I bet 10 GB are dups.

A checksum-based management of duplicates will help with the files
with identical contents, but not with files that differ even a bit.

Perl program below -- a modified version of Randal Schwartz's
version[0] -- uses md5(1) to identify duplicates (as in identical
files), failing that, Image::Magick based on fuzz factor.  When it
finds duplicates, it asks to enter the item number from the file
list to be deleted.

  [0] Article Finding similar images,
  http://www.stonehenge.com/merlyn/LinuxMag/col50.html


To be able to run, it needs Image::Magick (graphics/ImageMagick
port), Cache::FileCache (devel/p5-Cache-Cache), List::Util
(lang/p5-Scalar-List-Utils), File::Copy  File::Path.

Mind that it, rather Image::Magick, may consume all of your memory
and/or temporary fs if you run it on all the files at once.

If you are good in Perl, you could modify the program to move the
duplicates in a directory (instead of deleting), and possibly not to
ask to take the particular action (if as you say you would have a
boat load of duplicates).

Without further interruptions, program follows ...

  #!perl

  #  This is a modified version of Randal Schwartz's ...
  #
  #http://www.stonehenge.com/merlyn/LinuxMag/col50.html
  #
  #  ... as it uses checksum (MD5 for now) to detect identical files, failing 
that
  #  uses Image::Magick.

  use warnings; use strict;

  $|++;

  use Image::Magick;
  use Cache::FileCache;
  use File::Copy qw( move );
  use File::Path qw( mkpath );
  use List::Util qw( reduce );

  use Carp qw(carp);

  use Getopt::Long qw( :config gnu_compat no_ignore_case no_debug );

  #  User option; permitted average deviation in the vector elements.
  my $fuzz = 15;

  #  User option; if defined, rename corrupt images into this dir.
  my $corrupt_dir = CORRUPT;
  {
my $usage;
GetOptions
(
  'h|usage|help' = \$usage
, 'f|fuzz=i' = \$fuzz
, 'c|corrupt=s' = \$corrupt_dir
, 'nc|nocorrupt' = sub { undef $corrupt_dir; }
)
or usage( 1 );

usage( 0 ) if $usage;

#  Check if any arguments remain which will be file names
usage( 1, No file(s) or directory(ies) given. ) unless scalar @ARGV;
  }

  sub warnif;

  my $cache = Cache::FileCache-new
( {
namespace = 'image.cache'
  , cache_root = ( glob( ~/log/misc ) )[ 0 ]
  }
);

  my @buckets;

  FILE: while ( @ARGV )
  {
my $file = shift;
next FILE if -l $file;
if ( -d $file )
{
  opendir DIR, $file or next FILE;
  unshift @ARGV, map { m/^\./ ? () : $file/$_; } sort readdir DIR;
  next FILE;
}

next FILE unless -f _ or -d _;

my ( @stat ) = stat _ or die should not happen: $!;

#  dev/ino/mtime
my $key = @stat[ 0, 1, 9 ];

my @vector;

#print $file ;
if ( my $data = $cache-get( $key ) )
{
  #print ... is cached\n;
  @vector = @$data;
}
else
{
  my $image = Image::Magick-new;
  if ( my $x = $image-Read( $file ) )
  {
if ( defined $corrupt_dir and $x =~ m/corrupt|unexpected end-of-file/i )
{
  print $file ;
  print ... renaming into $corrupt_dir\n;

  -d $corrupt_dir
or mkpath $corrupt_dir, 0, 0700
or die Cannot mkpath $corrupt_dir: $!;

  move $file, $corrupt_dir or warn Cannot rename: $!;
}
else
{
  print $file ;
  print ... skipping ( $x )\n;
}
next FILE;
  }

  #print is , join( x, $image-Get( 'width', 'height' ) ), \n;
  warnif $image-Normalize();
  warnif $image-Resize( geometry = '4x4!' );
  warnif $image-Set( magick = 'rgb' );
  @vector = unpack C*, $image-ImageToBlob();
  $cache-set( $key, [ @vector ] );
}
BUCKET: for my $bucket ( @buckets )
{
  my $error = 0;
  INDEX: for my $index ( 0 .. $#vector )
  {
$error += abs( $bucket-[ 0 ][ $index ] - $vector[ $index ] );
next BUCKET if $error  $fuzz * @vector;
  }
  push @$bucket, $file;

  #print linked , join( , , @$bucket[ 1 .. $#$bucket ] ), \n;
  next FILE;
}
push @buckets, 

Searching a drive and copying files

2006-07-22 Thread Joshua Lewis

Hello List,

I have a two part question for anyone who may be able to help.

I need to search my drive for all pictures on my system and copy them  
to a networked system using sftp or ssh or what not. There will be  
duplicate names on the drive so I was hoping to have dups placed in a  
separate folder. Due to my for lack of a better term stupidity when I  
first got my camera I will probably have instances when there will be  
three or four duplicates. If anyone can help me out with that it  
would be great.


Second is there a resource online I can use to learn how to do my own  
shell scripting?


My goal is to find all my pictures and compare them then delete the  
dups that don't look that good. A daunting task as I have 20 GB of  
data. I bet 10 GB are dups.


Thanks for any help.

Sincerely,
Joshua Lewis
[EMAIL PROTECTED]



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Searching a drive and copying files

2006-07-22 Thread Mike Jeays
On Sat, 2006-07-22 at 10:47 -0400, Joshua Lewis wrote:
 Hello List,
 
 I have a two part question for anyone who may be able to help.
 
 I need to search my drive for all pictures on my system and copy them  
 to a networked system using sftp or ssh or what not. There will be  
 duplicate names on the drive so I was hoping to have dups placed in a  
 separate folder. Due to my for lack of a better term stupidity when I  
 first got my camera I will probably have instances when there will be  
 three or four duplicates. If anyone can help me out with that it  
 would be great.
 
 Second is there a resource online I can use to learn how to do my own  
 shell scripting?
 
 My goal is to find all my pictures and compare them then delete the  
 dups that don't look that good. A daunting task as I have 20 GB of  
 data. I bet 10 GB are dups.
 
 Thanks for any help.
 
 Sincerely,
 Joshua Lewis
 [EMAIL PROTECTED]
 
 
 
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]

I have a perl script that does part of this, using MD5 hashes to
identify duplicates. I posted it at

http://ca.geocities.com/[EMAIL PROTECTED]/treeprune.pl

Use at your own risk!



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]