Re: Skipping hardlinks in a copy

2007-03-09 Thread Phil Howard
On Thu, Mar 08, 2007 at 01:34:03PM -0800, Sriram Ramkrishna wrote:

| OK, I wasn't aware that you couldn't hardlink a directory to another
| directory.

You're not supposed to be able to.  But some systems allow it in a
restricted manner just for very special issues such as recovery of
an otherwise corrupt filesystem.  Some others might allow it due to
a bug.

Hard linking directories can produce the loops, and this is a major
reason for disallowing them in the normal case.  Another is that an
empty directory with 2 or more links will not appear empty in the
normal manner.  All these things could be worked around, but it is
detrimental to the system to have to do things in such ways.

Sequent Dynix around 1990 or so allowed doing rename(2) simultaneously
on different processors, each naming a different target name, without
proper locking, and it would result in hard links.  I encountered this
with regular files then tested it with directories and it allowed that,
too.  Then the directories could not be removed, even though empty.  I
used the same bug to reverse the process that created them by renaming
the two directories back to one name to get out of the mess.


| OK.  Looks like I just have to deal wtih each cycle I encounter and
| break it.  Joy. :-)

What does "find" for that system do?  If they allow hard links, then
their tools should know how to work around them.

I hope your application doesn't depend on these hard linked directories.


| I'm at a loss then at what I'm looking at.  Maybe it's following
| symlinks and I have not checked the arguments properly.  It might
| be that symbolic links are causing the issue, but in that case it
| doesn't seem to explain why it's taking days to copy a level 0 copy.
| I leaped on the link tree issue, because strace on an rsync was
| showing it going through the same progression of directories.
| I'm going to have to go back and run it again and see if I can
| catch it.

Normally, symlinks to directories are not followed.  I've never seen
rsync or find do that.  I've written a tree recursion function in a
library and it doesn't follow symlinks.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Skipping hardlinks in a copy

2007-03-08 Thread Sriram Ramkrishna
On Thu, Mar 08, 2007 at 10:14:39AM -0800, Wayne Davison wrote:
> On Wed, Mar 07, 2007 at 09:22:08PM -0800, Sriram Ramkrishna wrote:
> > Is there a way to have it skip hard links when doing an rsync?
> 
> If you mean you want to skip any file that has more than one link, you
> could do this:
> 
> find . -type f -links +1 >/path/exclude.txt
> 
> Then, you'd use the exclude.txt file via the --exclude-from option.
> 
> However, you mentioned loops, and that makes me think that the problem
> is not with file loops, but dirs looping back in the hierarchy.  The
> above won't help you if that is the case (since find will loop too).

Indeed.  This is from watching rsync through strace since I wasn't sure
why it was taking so long to do an rsync.  It seems I need to collect
more data to see what's going on.

sri
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Skipping hardlinks in a copy

2007-03-08 Thread Sriram Ramkrishna
On Thu, Mar 08, 2007 at 10:15:01PM +0100, Paul Slootman wrote:
> On Thu 08 Mar 2007, Sri Ramkrishna wrote:
> > 
> > I think I probably hard links to directories.  I have observed cpio
> > going through a loop continously.  Since I was doing this on an AIX
> > JFS filesystem (on an AIX fileserver) it might not have same protections
> > that I believe Linux when hitting a circular loop.
> 
> If there are hard links to directories (apart from the . and .. links)
> then the filesystem is corrupt; it should be impossible to hardlink a
> directory to create another entry that points to it.

OK, I wasn't aware that you couldn't hardlink a directory to another
directory.

> Linux has no protection against such things beyond ensuring the
> filesystem stays sane... just like AIX most certainly should also have.
> It's entirely possible that AIX offers some way of hardlinking
> directories (it's been over 10 years since I last touched AIX :-) but if
> so, there's no real sane way of handling such situations.

OK.  Looks like I just have to deal wtih each cycle I encounter and
break it.  Joy. :-)

> > I think this is exactly what's happening.  I think I have a number of
> > cycles that are causing the data to go loopy. (pardon the pun)  If
> > that's the case, how does one find self referential hard/softlinks?
> 
> It sounds like you're confusing hard links with soft (or symbolic)
> links, by the way you mention them above.
> Cpio, find, rsync, whatever will not by default follow symbolic links
> (unless instructed to do so, in which case any problems arising from
> that are the user's fault).  As I mentioned above, hardlinks to
> directories shouldn't exist. Without hardlinks to directories, you won't
> have loops.

I'm at a loss then at what I'm looking at.  Maybe it's following
symlinks and I have not checked the arguments properly.  It might
be that symbolic links are causing the issue, but in that case it
doesn't seem to explain why it's taking days to copy a level 0 copy.
I leaped on the link tree issue, because strace on an rsync was
showing it going through the same progression of directories.
I'm going to have to go back and run it again and see if I can
catch it.

> > > The command "find . -type l" will only find symlinks.  You can find
> > > files that have hard links with "find . ! -type d -links +1 -print".  
> 
> > Can I also do use find to create a list of files that are not hardlink
> 
> A file that's not hardlinked will have a link count of 1 (which is very
> logical if you think about what that link count means...)

Yep, indeed it is.  Thank you for taking the time to answer my question,
much obliged.

sri
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Skipping hardlinks in a copy

2007-03-08 Thread Paul Slootman
On Thu 08 Mar 2007, Sri Ramkrishna wrote:
> 
> I think I probably hard links to directories.  I have observed cpio
> going through a loop continously.  Since I was doing this on an AIX
> JFS filesystem (on an AIX fileserver) it might not have same protections
> that I believe Linux when hitting a circular loop.

If there are hard links to directories (apart from the . and .. links)
then the filesystem is corrupt; it should be impossible to hardlink a
directory to create another entry that points to it.
Linux has no protection against such things beyond ensuring the
filesystem stays sane... just like AIX most certainly should also have.
It's entirely possible that AIX offers some way of hardlinking
directories (it's been over 10 years since I last touched AIX :-) but if
so, there's no real sane way of handling such situations.


> I think this is exactly what's happening.  I think I have a number of
> cycles that are causing the data to go loopy. (pardon the pun)  If
> that's the case, how does one find self referential hard/softlinks?

It sounds like you're confusing hard links with soft (or symbolic)
links, by the way you mention them above.
Cpio, find, rsync, whatever will not by default follow symbolic links
(unless instructed to do so, in which case any problems arising from
that are the user's fault).  As I mentioned above, hardlinks to
directories shouldn't exist. Without hardlinks to directories, you won't
have loops.

> > The command "find . -type l" will only find symlinks.  You can find
> > files that have hard links with "find . ! -type d -links +1 -print".  

> Can I also do use find to create a list of files that are not hardlink

A file that's not hardlinked will have a link count of 1 (which is very
logical if you think about what that link count means...)


Paul Slootman
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Skipping hardlinks in a copy

2007-03-08 Thread Sri Ramkrishna
On Wed, Mar 07, 2007 at 09:22:08PM -0800, Sriram Ramkrishna wrote:

Hi there,

For some reason, I sent this mail before I was fully subscribed and I
have missed out on the replies.  If I don't answer all the responses this
is why.


> The following command pipeline can give you a list which you could
> isolate to being just the first ocurrence of each file that is sharing the 
> same inode:

> find . ! -type d -printf '%10i %P\n' | awk
> '{n=substr($0,12);if(a[$1]==1){print 
> "other",n;}else{a[$1]=1;print "first",n;}}'

Yes, I think I have something similar that someone else has used to do
the same thing.  Thank you, this is most useful.

> One approach in the situation you have, if the filesystem is not corrupt
> (which it might be, because files don't create cycles), is to create a

I think I probably hard links to directories.  I have observed cpio
going through a loop continously.  Since I was doing this on an AIX
JFS filesystem (on an AIX fileserver) it might not have same protections
that I believe Linux when hitting a circular loop.

> list of files based on their inode number, and hardlink each file to one
> named by its inode number.  Just rsync the directory full of inode
> numbers.  Then re-expand on the destination based on that list.

> You should not be following symlinks in a file tree recursion.  Rsync,
> find, cpio, and others, know not to.

> But I suspect some kind of filesystem corruption, or at least some hard
> links being applied to directories.  The latter can create cycles if not
> done carefully (and there is virtually no case to ever do that at all by
> intent).

I think this is exactly what's happening.  I think I have a number of
cycles that are causing the data to go loopy. (pardon the pun)  If
that's the case, how does one find self referential hard/softlinks?

> I do not consider it bad organization to have lots of files be
> hardlinked.  In fact, I have a program that actually seeks out
> indentical files and makes them be hardlinked to save space (not
> safe in all cases, but safe in most).

Sure, but in a large filesystem, it's been very painful to copy this
data when rsync is taking days instead of hours.

> The command "find . -type l" will only find symlinks.  You can find
> files that have hard links with "find . ! -type d -links +1 -print".  
> Note that all file types can have hard links, even symlinks.  Do 
> exclude directories as those will have many links for other reasons 
> (e.g. 1 for self reference, 1 for being inside a directory and 1 each 
> for each subdirectory within).

Can I also do use find to create a list of files that are not hardlink
and then use --include-file and --exclude=*?  I had thought that might
be an alternative way.  If I use this rule, does rsync still stat
through the filesystem?

sri
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Skipping hardlinks in a copy

2007-03-08 Thread Wayne Davison
On Wed, Mar 07, 2007 at 09:22:08PM -0800, Sriram Ramkrishna wrote:
> Is there a way to have it skip hard links when doing an rsync?

If you mean you want to skip any file that has more than one link, you
could do this:

find . -type f -links +1 >/path/exclude.txt

Then, you'd use the exclude.txt file via the --exclude-from option.

However, you mentioned loops, and that makes me think that the problem
is not with file loops, but dirs looping back in the hierarchy.  The
above won't help you if that is the case (since find will loop too).

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Skipping hardlinks in a copy

2007-03-08 Thread Phil Howard
On Wed, Mar 07, 2007 at 09:22:08PM -0800, Sriram Ramkrishna wrote:

| Hi folks, I've been googling around for awhile but I can't seem to find
| an answer to my question. 
| 
| I have a number of filesystems that contain thousands of hard links due
| to some bad organization of data.  Rsync, cpio and various other
| utilities fail to copy this data because I think there might be some
| cycles in it.  (you know you have troubles if cpio can't copy it!)
| 
| What I thought I would do instead is to copy the data but skip any files
| that are hard links.  Then after the copy is finished, I will use some
| kind of find . -type l type command that finds the hard links and then
| make a script to recreate it.  This saves me a lot of trouble with not
| having to stat the files and not having the receive side balloon up.
| 
| Is there a way to have it skip hard links when doing an rsync?
| Or is there some other mystic incantation that I can use that might
| accomplish the same thing.

The following command pipeline can give you a list which you could isolate
to being just the first ocurrence of each file that is sharing the same
inode:

find . ! -type d -printf '%10i %P\n' | awk '{n=substr($0,12);if(a[$1]==1){print 
"other",n;}else{a[$1]=1;print "first",n;}}'

Note the above is 123 characters long.  You may have issues with mail
programs that truncate or wrap it around, so be careful.  The fixed
size formatting of the inode number in the find output is to make it
easy to extract the name, or the name plust the symlink target, in the
awk command using substr().

One approach in the situation you have, if the filesystem is not corrupt
(which it might be, because files don't create cycles), is to create a
list of files based on their inode number, and hardlink each file to one
named by its inode number.  Just rsync the directory full of inode numbers.
Then re-expand on the destination based on that list.

You should not be following symlinks in a file tree recursion.  Rsync,
find, cpio, and others, know not to.

But I suspect some kind of filesystem corruption, or at least some hard
links being applied to directories.  The latter can create cycles if not
done carefully (and there is virtually no case to ever do that at all by
intent).

I do not consider it bad organization to have lots of files be hardlinked.
In fact, I have a program that actually seeks out indentical files and makes
them be hardlinked to save space (not safe in all cases, but safe in most).

The command "find . -type l" will only find symlinks.  You can find files
that have hard links with "find . ! -type d -links +1 -print".  Note that
all file types can have hard links, even symlinks.  Do exclude directories
as those will have many links for other reasons (e.g. 1 for self reference,
1 for being inside a directory and 1 each for each subdirectory within).

-- 
|---/--|
| Phil Howard KA9WGN (ka9wgn.ham.org)  /  Do not send to the address below |
| first name lower case at ipal.net   /  [EMAIL PROTECTED] |
|/-|
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Skipping hardlinks in a copy

2007-03-08 Thread Paul Slootman
On Wed 07 Mar 2007, Sriram Ramkrishna wrote:

> that are hard links.  Then after the copy is finished, I will use some
> kind of find . -type l type command that finds the hard links and then

find -type l will find symbolic links, *not* hard links.


Paul Slootman
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Skipping hardlinks in a copy

2007-03-08 Thread Eur Ing Chris Green
On Wed, Mar 07, 2007 at 09:22:08PM -0800, Sriram Ramkrishna wrote:
> Hi folks, I've been googling around for awhile but I can't seem to find
> an answer to my question. 
> 
> I have a number of filesystems that contain thousands of hard links due
> to some bad organization of data.  Rsync, cpio and various other
> utilities fail to copy this data because I think there might be some
> cycles in it.  (you know you have troubles if cpio can't copy it!)
> 
> What I thought I would do instead is to copy the data but skip any files
> that are hard links.  Then after the copy is finished, I will use some
> kind of find . -type l type command that finds the hard links and then
> make a script to recreate it.  This saves me a lot of trouble with not
> having to stat the files and not having the receive side balloon up.
> 
> Is there a way to have it skip hard links when doing an rsync?
> Or is there some other mystic incantation that I can use that might
> accomplish the same thing.
> 
Surely a hard link is just 'a file', that's what a file is, thus it's
impossible to skip them without skipping everything (except symbolic
links, FIFOs, etc).  The only clue to something having more than one
link to it is the 'number of links', but then how do you decide which
link is the 'right' one to copy as they're all the file.

-- 
Chris Green
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Skipping hardlinks in a copy

2007-03-07 Thread Sriram Ramkrishna
Hi folks, I've been googling around for awhile but I can't seem to find
an answer to my question. 

I have a number of filesystems that contain thousands of hard links due
to some bad organization of data.  Rsync, cpio and various other
utilities fail to copy this data because I think there might be some
cycles in it.  (you know you have troubles if cpio can't copy it!)

What I thought I would do instead is to copy the data but skip any files
that are hard links.  Then after the copy is finished, I will use some
kind of find . -type l type command that finds the hard links and then
make a script to recreate it.  This saves me a lot of trouble with not
having to stat the files and not having the receive side balloon up.

Is there a way to have it skip hard links when doing an rsync?
Or is there some other mystic incantation that I can use that might
accomplish the same thing.

Thanks, 
sri
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html