On Wed, Mar 07, 2007 at 09:22:08PM -0800, Sriram Ramkrishna wrote: Hi there,
For some reason, I sent this mail before I was fully subscribed and I have missed out on the replies. If I don't answer all the responses this is why. > The following command pipeline can give you a list which you could > isolate to being just the first ocurrence of each file that is sharing the > same inode: > find . ! -type d -printf '%10i %P\n' | awk > '{n=substr($0,12);if(a[$1]==1){print > "other",n;}else{a[$1]=1;print "first",n;}}' Yes, I think I have something similar that someone else has used to do the same thing. Thank you, this is most useful. > One approach in the situation you have, if the filesystem is not corrupt > (which it might be, because files don't create cycles), is to create a I think I probably hard links to directories. I have observed cpio going through a loop continously. Since I was doing this on an AIX JFS filesystem (on an AIX fileserver) it might not have same protections that I believe Linux when hitting a circular loop. > list of files based on their inode number, and hardlink each file to one > named by its inode number. Just rsync the directory full of inode > numbers. Then re-expand on the destination based on that list. > You should not be following symlinks in a file tree recursion. Rsync, > find, cpio, and others, know not to. > But I suspect some kind of filesystem corruption, or at least some hard > links being applied to directories. The latter can create cycles if not > done carefully (and there is virtually no case to ever do that at all by > intent). I think this is exactly what's happening. I think I have a number of cycles that are causing the data to go loopy. (pardon the pun) If that's the case, how does one find self referential hard/softlinks? > I do not consider it bad organization to have lots of files be > hardlinked. In fact, I have a program that actually seeks out > indentical files and makes them be hardlinked to save space (not > safe in all cases, but safe in most). Sure, but in a large filesystem, it's been very painful to copy this data when rsync is taking days instead of hours. > The command "find . -type l" will only find symlinks. You can find > files that have hard links with "find . ! -type d -links +1 -print". > Note that all file types can have hard links, even symlinks. Do > exclude directories as those will have many links for other reasons > (e.g. 1 for self reference, 1 for being inside a directory and 1 each > for each subdirectory within). Can I also do use find to create a list of files that are not hardlink and then use --include-file and --exclude=*? I had thought that might be an alternative way. If I use this rule, does rsync still stat through the filesystem? sri -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html