Re: checking which files on a CD are not in a git-annex repo
Joey Hess: > Thomas Koch wrote: > > It'd be of course wonderful if I could tell git-annex directly to import > > all files of the disc. Duplicate files should symlink to the same file > > in the git- annex backend, shouldn't they? > > Yes. If you don't mind the overhead of copying all the files, simply > copying the whole CD to a subdirectory and running git annex add will do > the trick. Any duplicate files will coalesce when added. Certainly not perfect but good enough: CDDIR=$1 find $CDDIR -type f -print | while read F do # echo searching $F FILENAME=$(basename "$F") FOUND=$(find . -path .git -prune -o -name "$FILENAME" -print|head -n 1) if [ -r "$FOUND" ] then echo found $FOUND else echo not found: $F DIRNAME=$(dirname "$F") mkdir -p ./"$DIRNAME" cp -v "$F" ./"$DIRNAME" fi done Still, a solution integrated in git-annex would be wonderful! Thomas Koch, http://www.koch.ro ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
[Slightly OT] Re: checking which files on a CD are not in a git-annex repo
> While that is true, a way to directly "diff" a random non-annex > directory and an annex would be _very_ handy, though. I've run into a similar (but different) problem too. So, +1 on a diff utility - it would be useful. My use case is a directory that I don't control that I want to mirror. I don't like the structure & naming of the original; it is big(ish) so I don't want to have multiple copies on my machine; and finally I want to make sure any new files are included into my git-annex is a *proper* dir/name scheme. So my (probably too complicated) solution (which I'm going to change slightly after writing all of this) is to use rsync with "don't delete" & "ignore existing" (I don't have rsync man page in front of me for the exact flags) to copy new files into my local directory. I'm not worried about content change which makes the process a little easier. The next step is to copy any non-symlinked files from my local directory to a staging area (i.e. a folder called 'To-Do' ;-) ) in my annex and then git-annex add them. Finally, I go through my local "copy" of the remote directory, and symlink any files to /dev/null. Now...after writing all that down, I think I'm going to change this by following Joey's suggestion and have the local copy in the annex, prob under $ANNEX/mirrors so I don't need to worry about maintaining integrity of that folder too...it also reduces the need to do the last symlink to /dev/null. HTH & happy to send a copy of my script if anyone wants it. Cheers, Olaf ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: checking which files on a CD are not in a git-annex repo
On Tue, Mar 27, 2012 at 20:02, Joey Hess wrote: > Yes. If you don't mind the overhead of copying all the files, simply > copying the whole CD to a subdirectory and running git annex add will do > the trick. Any duplicate files will coalesce when added. While that is true, a way to directly "diff" a random non-annex directory and an annex would be _very_ handy, though. RIchard ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: git-annex diagnostics
Certainly not perfect but good enough: CDDIR=$1 find $CDDIR -type f -print | while read F do # echo searching $F FILENAME=$(basename "$F") FOUND=$(find . -path .git -prune -o -name "$FILENAME" -print|head -n 1) if [ -r "$FOUND" ] then echo found $FOUND else echo not found: $F DIRNAME=$(dirname "$F") mkdir -p ./"$DIRNAME" cp -v "$F" ./"$DIRNAME" fi done Still a solution backed into git-annex would be wonderful! Thomas Koch, http://www.koch.ro ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
Re: checking which files on a CD are not in a git-annex repo
Thomas Koch wrote: > It'd be of course wonderful if I could tell git-annex directly to import all > files of the disc. Duplicate files should symlink to the same file in the git- > annex backend, shouldn't they? Yes. If you don't mind the overhead of copying all the files, simply copying the whole CD to a subdirectory and running git annex add will do the trick. Any duplicate files will coalesce when added. -- see shy jo signature.asc Description: Digital signature ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home
checking which files on a CD are not in a git-annex repo
Hi, how could I check, which files on a CD are not yet in a specific git-annex repo? I presume it'd be possible to calculate the checksum of the files on the CD and check those against the annex'ed files. Of course afterwards I want to feed the list in a corresponding cp command. (Which will lead to the next problem if I want to preserve the directory structure... another time in my life to re-learn cpio...?) It'd be of course wonderful if I could tell git-annex directly to import all files of the disc. Duplicate files should symlink to the same file in the git- annex backend, shouldn't they? Regards, Thomas Koch, http://www.koch.ro ___ vcs-home mailing list vcs-home@lists.madduck.net http://lists.madduck.net/listinfo/vcs-home