Re: panic: vfs_busy: unexpected lock failure

1999-03-18 Thread Peter Wemm
Matthew Dillon wrote:
 :On Tue, Mar 16, 1999 at 12:52:32PM -0800, Matthew Dillon wrote:
 : A..  And if you make those AMD mounts normal nfs mounts it doesn't
 
 : fry?  If so, then we have a bug in AMD somewhere.
 :
 :I tried the cp several times again on a regular NFS mount, to make
 :sure, and no, it doesn't seem to panic. So yes, that seems to be
 :AMD-related.  Can't it be in the vfs layer though?
 :-- 
 :Pierre Beyssac   p...@enst.fr
 
 It's probably AMD.  I'm not really up on how AMD works... hasn't someone
 done some work on it recently to fix other breakages?  Maybe they could
 look at this panic.

AMD is easy to upset, and that's bad because it's holding a mountpoint in /
(ie: /host) which often gets hit by every single getcwd() call when it 
gets a lstat(/host...) or whatever.  I think this is the single largest 
source of load on the amd process.

The other problem is that amd is an rpc client, it depends on the libc rpc 
code for robustness, and that's not the first word that springs to mind 
when I think of it...  When amd hangs on a dns lookup, there are all sorts 
of VFS locking cascades and NFS wedges while the kernel is retrying all 
those retransmitted packets to amd's pseudo-nfs server port.  It's been 
found to be the primary cause of the 'nfsrcv' hangs - processes wedged in 
getcwd() style situations trying to stat /host.

IMHO, /host needs to move down a level to get it out of the way of 
getcwd().  NFS mounts should probably move away from / as well, as they 
cause traffic on each getcwd().

I think the default settings should look something like this..

/net- amd and nfs related stuff
/net/sysname/mount1 - nfs mount created by amd
/net/sysname/mount2 - nfs mount created by amd
/net/host   - /host lives here instead.

and a symlink:
/host - /net/host

I think that'll stop amd from being hammered by all those lstat()'s in 
getcwd and friends in the root directory.

And instead of mounting NFS things as:  /a,  mount them as /net/a instead 
and use a symlink.

This isn't a fix, it's just trying to move a particularly weak link out
of the direct line of fire.  A real solution would be a proper userfs
interface that could cope with kernel-user_process protocol timeouts,
process deaths, etc.  Of course, then there's always an in-kernel autofs
etc.

Cheers,
-Peter




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: panic: vfs_busy: unexpected lock failure

1999-03-18 Thread Garrett Wollman
On Thu, 18 Mar 1999 22:49:10 +0800, Peter Wemm pe...@netplex.com.au said:

 AMD is easy to upset, and that's bad because it's holding a mountpoint in /
 (ie: /host) which often gets hit by every single getcwd() call when it 
 gets a lstat(/host...) or whatever.  I think this is the single largest 
 source of load on the amd process.

 IMHO, /host needs to move down a level to get it out of the way of 
 getcwd().  NFS mounts should probably move away from / as well, as they 
 cause traffic on each getcwd().

`/host' is non-standard.  The Standard Configuration is `/net' is the
directory simulated by amd and `/a/${hostname}/root' is where amd
mounts the directory tree.  This is done specifically to avoid getcwd
wedgitude.  The example we ship would sorely puzzle anyone who is
experienced running a Standard Configuration amd.

My machine has, throughout its entire history, had `/home' simulated
by amd.  I have literally *never* had amd hose my configuration (and I
would know it fast since both mail and Web service would break).

-GAWollman

--
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
woll...@lcs.mit.edu  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: panic: vfs_busy: unexpected lock failure

1999-03-17 Thread Matthew Dillon
:On Tue, Mar 16, 1999 at 12:52:32PM -0800, Matthew Dillon wrote:
: A..  And if you make those AMD mounts normal nfs mounts it doesn't 
: fry?  If so, then we have a bug in AMD somewhere.
:
:I tried the cp several times again on a regular NFS mount, to make
:sure, and no, it doesn't seem to panic. So yes, that seems to be
:AMD-related.  Can't it be in the vfs layer though?
:-- 
:Pierre Beyssac p...@enst.fr

It's probably AMD.  I'm not really up on how AMD works... hasn't someone
done some work on it recently to fix other breakages?  Maybe they could
look at this panic.

-Matt
Matthew Dillon 
dil...@backplane.com



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: panic: vfs_busy: unexpected lock failure

1999-03-16 Thread Pierre Beyssac
On Mon, Mar 15, 1999 at 01:24:46PM -0800, Matthew Dillon wrote:
 Compile up a kernel with 'options DDB' and get a backtrace when
 it panics next ( 'trace' command from DDB prompt ).

Ok, here goes. The kernel is compiled without -g for the moment,
but I've provided the function offsets if that may help.

vfs_busy()  at vfs_busy+0x6d
lookup()+0x3b9
namei() +0x180
stat()  +0x44
syscall()   +0x187

I also get what seems to be spurious EPROTONOSUPPORT errors that
show up in cp while copying files...
-- 
Pierre Beyssac  p...@enst.fr


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: panic: vfs_busy: unexpected lock failure

1999-03-16 Thread Matthew Dillon
:On Mon, Mar 15, 1999 at 01:24:46PM -0800, Matthew Dillon wrote:
: Compile up a kernel with 'options DDB' and get a backtrace when
: it panics next ( 'trace' command from DDB prompt ).
:
:Ok, here goes. The kernel is compiled without -g for the moment,
:but I've provided the function offsets if that may help.
:
:vfs_busy() at vfs_busy+0x6d
:lookup()   +0x3b9
:namei()+0x180
:stat() +0x44
:syscall()  +0x187
:
:I also get what seems to be spurious EPROTONOSUPPORT errors that
:show up in cp while copying files...
:-- 
:Pierre Beyssac p...@enst.fr

The code in lookup() that calls vfs_busy() is:

while (dp-v_type == VDIR  (mp = dp-v_mountedhere) 
   (cnp-cn_flags  NOCROSSMOUNT) == 0) { 
if (vfs_busy(mp, 0, 0, p))
continue;
error = VFS_ROOT(mp, tdp);
vfs_unbusy(mp, p);
if (error)
goto bad2;  
vput(dp);
ndp-ni_vp = dp = tdp;  
}

You shouldn't be crossing a mount point.  Are you by chance doing a
recursive copy onto itself?

e.g. cp -rp src destwhere dest is mounted under src somewhere ?

Of course, it is still a serious kernel bug.  I would like to try 
to reproduce it in order to track it down.  How are things mounted on
your system ( df ) and what are the *exact* arguments you are using with
cp?

-Matt
Matthew Dillon 
dil...@backplane.com



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: panic: vfs_busy: unexpected lock failure

1999-03-16 Thread Pierre Beyssac
On Tue, Mar 16, 1999 at 11:11:44AM -0800, Matthew Dillon wrote:
(cnp-cn_flags  NOCROSSMOUNT) == 0) { 
 if (vfs_busy(mp, 0, 0, p))
 continue;
...
 You shouldn't be crossing a mount point.  Are you by chance doing a
 recursive copy onto itself?
 e.g. cp -rp src dest  where dest is mounted under src somewhere ?

No. At first it was from a NFS-mounted volume to another NFS-mounted
volume. I then found that it panic'ed the same when I copied from
a local FFS volume to the same NFS volume.

The NFS volumes are automounted by amd under /a. That may well have
something to do with the panic: that's a recent change in my
configuration; I previously used NFS mounts in /etc/fstab which
didn't cause me any trouble.

 Of course, it is still a serious kernel bug.  I would like to try 
 to reproduce it in order to track it down.  How are things mounted on
 your system ( df ) and what are the *exact* arguments you are using with
 cp?

Here's the df (I removed some of the amd dummy mount points).

$ df
Filesystem  1K-blocks UsedAvail Capacity  Mounted on
/dev/wd0s1a 49583345951102276%/
/dev/wd1s1e   5975845  3556146  194163265%/home
/dev/wd0s1f148823 1290   135628 1%/tmp
/dev/wd0s1g   5380597  1615221  333492933%/usr
/dev/wd0s1e39689538127   32701710%/var
procfs  440   100%/proc
[ ten pid...@bofh:/xyz lines removed ]
pid...@bofh:/cal000   100%/cal
huuh:/home/huuh   1217519  1064153   14119188%/a/huuh/home/huuh

The failing cp is:

$ cp -rp /home/beyssac/src/sendmail-8.9.3/cf/ /home/beyssac/nfs/junk/

In the above, /home/beyssac/nfs is a symbolic link to
/cal/huuh/cal/beyssac which is automounted by amd (last line in
the above df).
-- 
Pierre Beyssac  p...@enst.fr


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: panic: vfs_busy: unexpected lock failure

1999-03-16 Thread Matthew Dillon

:
:On Tue, Mar 16, 1999 at 11:11:44AM -0800, Matthew Dillon wrote:
:(cnp-cn_flags  NOCROSSMOUNT) == 0) { 
: if (vfs_busy(mp, 0, 0, p))
: continue;
:...
: You shouldn't be crossing a mount point.  Are you by chance doing a
: recursive copy onto itself?
: e.g. cp -rp src dest where dest is mounted under src somewhere ?
:
:No. At first it was from a NFS-mounted volume to another NFS-mounted
:volume. I then found that it panic'ed the same when I copied from
:a local FFS volume to the same NFS volume.
:
:The NFS volumes are automounted by amd under /a. That may well have
:something to do with the panic: that's a recent change in my
:configuration; I previously used NFS mounts in /etc/fstab which
:didn't cause me any trouble.
:
: Of course, it is still a serious kernel bug.  I would like to try 
: to reproduce it in order to track it down.  How are things mounted on
: your system ( df ) and what are the *exact* arguments you are using with
: cp?
:
:Here's the df (I removed some of the amd dummy mount points).
:
:$ df
:Filesystem  1K-blocks UsedAvail Capacity  Mounted on
:/dev/wd0s1a 49583345951102276%/
:/dev/wd1s1e   5975845  3556146  194163265%/home
:/dev/wd0s1f148823 1290   135628 1%/tmp
:/dev/wd0s1g   5380597  1615221  333492933%/usr
:/dev/wd0s1e39689538127   32701710%/var
:procfs  440   100%/proc
:[ ten pid...@bofh:/xyz lines removed ]
:pid...@bofh:/cal000   100%/cal
:huuh:/home/huuh   1217519  1064153   14119188%/a/huuh/home/huuh
:
:The failing cp is:
:
:$ cp -rp /home/beyssac/src/sendmail-8.9.3/cf/ /home/beyssac/nfs/junk/
:
:In the above, /home/beyssac/nfs is a symbolic link to
:/cal/huuh/cal/beyssac which is automounted by amd (last line in
:the above df).
:-- 
:Pierre Beyssac p...@enst.fr

A..  And if you make those AMD mounts normal nfs mounts it doesn't 
fry?  If so, then we have a bug in AMD somewhere.

-Matt
Matthew Dillon 
dil...@backplane.com



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: panic: vfs_busy: unexpected lock failure

1999-03-16 Thread Pierre Beyssac
On Tue, Mar 16, 1999 at 12:52:32PM -0800, Matthew Dillon wrote:
 A..  And if you make those AMD mounts normal nfs mounts it doesn't 
 fry?  If so, then we have a bug in AMD somewhere.

I tried the cp several times again on a regular NFS mount, to make
sure, and no, it doesn't seem to panic. So yes, that seems to be
AMD-related.  Can't it be in the vfs layer though?
-- 
Pierre Beyssac  p...@enst.fr


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



panic: vfs_busy: unexpected lock failure

1999-03-15 Thread Pierre Beyssac
Hello,

My FreeBSD box keeps panicing when I'm trying to do a simple cp
-rp from a local disk to a NFS-mounted disk. The NFS server is a
Solaris 2.5 box; the NFS partition is mounted through amd.

The files I try to copy are just sendmail's cf directory (lots of
small files) and the panic happens every time I try (with cp -rp;
not with piped tars).

The kernel is today's, with NFS compiled-in (it's not a module).

I'm having the following message:
panic: vfs_busy: unexpected lock failure
-- 
Pierre Beyssac  p...@enst.fr


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: panic: vfs_busy: unexpected lock failure

1999-03-15 Thread Matthew Dillon
:Hello,
:
:My FreeBSD box keeps panicing when I'm trying to do a simple cp
:-rp from a local disk to a NFS-mounted disk. The NFS server is a
:Solaris 2.5 box; the NFS partition is mounted through amd.
:
:The files I try to copy are just sendmail's cf directory (lots of
:small files) and the panic happens every time I try (with cp -rp;
:not with piped tars).
:
:The kernel is today's, with NFS compiled-in (it's not a module).
:
:I'm having the following message:
:   panic: vfs_busy: unexpected lock failure
:-- 
:Pierre Beyssac p...@enst.fr

Compile up a kernel with 'options DDB' and get a backtrace when
it panics next ( 'trace' command from DDB prompt ).

-Matt
Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message