Re: checking which files on a CD are not in a git-annex repo

2012-03-27 Thread Thomas Koch
Joey Hess:
> Thomas Koch wrote:
> > It'd be of course wonderful if I could tell git-annex directly to import
> > all files of the disc. Duplicate files should symlink to the same file
> > in the git- annex backend, shouldn't they?
> 
> Yes. If you don't mind the overhead of copying all the files, simply
> copying the whole CD to a subdirectory and running git annex add will do
> the trick. Any duplicate files will coalesce when added.

Certainly not perfect but good enough:

CDDIR=$1

find $CDDIR -type f -print | while read F
do
#  echo searching $F
  FILENAME=$(basename "$F")
  FOUND=$(find . -path .git -prune -o -name "$FILENAME" -print|head -n 1)
  if [ -r "$FOUND" ]
  then
echo found $FOUND
  else
echo not found: $F
DIRNAME=$(dirname "$F")
mkdir -p ./"$DIRNAME"
cp -v "$F" ./"$DIRNAME"
  fi
done 

Still, a solution integrated in git-annex would be wonderful!

Thomas Koch, http://www.koch.ro
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


[Slightly OT] Re: checking which files on a CD are not in a git-annex repo

2012-03-27 Thread Olaf TNSB
> While that is true, a way to directly "diff" a random non-annex
> directory and an annex would be _very_ handy, though.

I've run into a similar (but different) problem too.  So, +1 on a diff
utility - it would be useful.

My use case is a directory that I don't control that I want to mirror. I
don't like the structure & naming of the original; it is big(ish) so I
don't want to have multiple copies on my machine; and finally I want to
make sure any new files are included into my git-annex is a *proper*
dir/name scheme.

So my (probably too complicated) solution (which I'm going to change
slightly after writing all of this) is to use rsync with "don't delete" &
"ignore existing" (I don't have rsync man page in front of me for the exact
flags) to copy new files into my local directory. I'm not worried about
content change which makes the process a little easier.

The next step is to copy any non-symlinked files from my local directory to
a staging area (i.e. a folder called 'To-Do' ;-) ) in my annex and then
git-annex add them.

Finally, I go through my local "copy" of the remote directory, and symlink
any files to /dev/null.

Now...after writing all that down, I think I'm going to change this by
following Joey's suggestion and have the local copy in the annex, prob
under $ANNEX/mirrors so I don't need to worry about maintaining integrity
of that folder too...it also reduces the need to do the last symlink to
/dev/null.

HTH & happy to send a copy of my script if anyone wants it.

Cheers,

Olaf
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: checking which files on a CD are not in a git-annex repo

2012-03-27 Thread Richard Hartmann
On Tue, Mar 27, 2012 at 20:02, Joey Hess  wrote:

> Yes. If you don't mind the overhead of copying all the files, simply
> copying the whole CD to a subdirectory and running git annex add will do
> the trick. Any duplicate files will coalesce when added.

While that is true, a way to directly "diff" a random non-annex
directory and an annex would be _very_ handy, though.


RIchard
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: git-annex diagnostics

2012-03-27 Thread Thomas Koch
Certainly not perfect but good enough:

CDDIR=$1

find $CDDIR -type f -print | while read F
do
#  echo searching $F
  FILENAME=$(basename "$F")
  FOUND=$(find . -path .git -prune -o -name "$FILENAME" -print|head -n 1)
  if [ -r "$FOUND" ]
  then
echo found $FOUND
  else
echo not found: $F
DIRNAME=$(dirname "$F")
mkdir -p ./"$DIRNAME"
cp -v "$F" ./"$DIRNAME"
  fi
done 

Still a solution backed into git-annex would be wonderful!

Thomas Koch, http://www.koch.ro
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: checking which files on a CD are not in a git-annex repo

2012-03-27 Thread Joey Hess
Thomas Koch wrote:
> It'd be of course wonderful if I could tell git-annex directly to import all 
> files of the disc. Duplicate files should symlink to the same file in the git-
> annex backend, shouldn't they?

Yes. If you don't mind the overhead of copying all the files, simply
copying the whole CD to a subdirectory and running git annex add will do
the trick. Any duplicate files will coalesce when added.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

checking which files on a CD are not in a git-annex repo

2012-03-27 Thread Thomas Koch
Hi,

how could I check, which files on a CD are not yet in a specific git-annex repo?

I presume it'd be possible to calculate the checksum of the files on the CD and 
check those against the annex'ed files.

Of course afterwards I want to feed the list in a corresponding cp command. 
(Which will lead to the next problem if I want to preserve the directory 
structure... another time in my life to re-learn cpio...?)

It'd be of course wonderful if I could tell git-annex directly to import all 
files of the disc. Duplicate files should symlink to the same file in the git-
annex backend, shouldn't they?

Regards,

Thomas Koch, http://www.koch.ro
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home