Re: odd NFS behaviour with DU 4.F client

1999-10-29 Thread Andrew Gallatin


Matthew Dillon writes:
  
  Well, there was a bug in nfsrv_create() which caused the server to
  not reply to an NFS packet.  This led to a general revamping of the
  server side code which may have fixed other rpc's at the same time.
  Whether fixing that bug solves the problem you are having or not is 
  unknown.
  
  :I would guess that the DU4CLIENT has the filehandle cached somewhere,
  :even though it has unmounted the filesystem.  
  :
  :My question: Whose fault is this?  Should the FreeBSD server be
  :ignoring requests to a valid filehandle if the client has not mounted
  :the FS?  Should it be returning some sort of error?
  :
  :Thanks,
  :
  :Drew
   
  There should be a response to the rpc either way so my guess is that
  it is a server-side bug.

It turns out that the user was in 17 groups (DU supports up to 32).
After I removed him from 2 groups  got his group count down to 15,
all was well.

After I upgrade the NFS server to a more recent -current, I'll test
this again with a user in 17 groups.

Thanks again,

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: odd NFS behaviour with DU 4.F client

1999-10-29 Thread Matthew Dillon


:  :
:  :Thanks,
:  :
:  :Drew
:   
:  There should be a response to the rpc either way so my guess is that
:  it is a server-side bug.
:
:It turns out that the user was in 17 groups (DU supports up to 32).
:After I removed him from 2 groups  got his group count down to 15,
:all was well.
:
:After I upgrade the NFS server to a more recent -current, I'll test
:this again with a user in 17 groups.
:
:Thanks again,
:
:Drew

Ahhh... I'm glad you found it.  I was beginning to scratch my head.

NGROUPS_MAX is set to 16 (/usr/src/sys/sys/syslimits.h).  You may
be able to patch the kernel to up the number of groups by upping
the value in that define and recompiling the kernel.  I've never
tried this myself but it should work.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: odd NFS behaviour with DU 4.F client

1999-10-29 Thread Andrew Gallatin


Matthew Dillon writes:
  
  Ahhh... I'm glad you found it.  I was beginning to scratch my head.
  
  NGROUPS_MAX is set to 16 (/usr/src/sys/sys/syslimits.h).  You may
  be able to patch the kernel to up the number of groups by upping
  the value in that define and recompiling the kernel.  I've never
  tried this myself but it should work.

Will a recent -current behave the same way, or will it return
something to the DU box?

Eg, will I need to worry about uppting NGROUPS_MAX when I upgrade the
box to a more recent kernel?

Thanks,

Drew
--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: odd NFS behaviour with DU 4.F client

1999-10-29 Thread Matthew Dillon


:Matthew Dillon writes:
:  
:  Ahhh... I'm glad you found it.  I was beginning to scratch my head.
:  
:  NGROUPS_MAX is set to 16 (/usr/src/sys/sys/syslimits.h).  You may
:  be able to patch the kernel to up the number of groups by upping
:  the value in that define and recompiling the kernel.  I've never
:  tried this myself but it should work.
:
:Will a recent -current behave the same way, or will it return
:something to the DU box?
:
:Eg, will I need to worry about uppting NGROUPS_MAX when I upgrade the
:box to a more recent kernel?
:
:Thanks,
:
:Drew
:--
:Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin
:Duke UniversityEmail: [EMAIL PROTECTED]
:Department of Computer Science Phone: (919) 660-6590

I don't know, I don't have a non-FreeBSD box to test with.   I would
be interested in knowing the answer!

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



odd NFS behaviour with DU 4.F client

1999-10-28 Thread Andrew Gallatin


We have an NFS server setup running an older FreeBSD-current (Wed Jun 30).
This server exports a filesystem to a number of heterogenous clients.
On most clients, this filesystem is automounted.

Occasionally, some random Digital UNIX box running 4.0F will partially
wedge because it's automounter is blocked accessing the FreeBSD
server's filesystem.  Any access to automounted directories will then
cause a process to hang.

I've noticed that if I do a tcpdump on the FreeBSD NFS server, I see:

17:48:16.397101 DU4CLIENT.1946927300  FREEBSDSERVER.nfs: 188 lookup fh 
234,2/163298400 "chase" (ttl 30, id 4256)
17:48:36.397144 DU4CLIENT.1946927300  FREEBSDSERVER.nfs: 188 lookup fh 
234,2/163298400 "chase" (ttl 30, id 4310)
17:48:56.397212 DU4CLIENT.1946927300  FREEBSDSERVER.nfs: 188 lookup fh 
234,2/163298400 "chase" (ttl 30, id 4384)
17:49:16.397123 DU4CLIENT.1946927300  FREEBSDSERVER.nfs: 188 lookup fh 
234,2/163298400 "chase" (ttl 30, id 4453)

These requests go on seemingly forever with no reply from the FreeBSD
NFS server.  "chase" is a users' directory in the top level of this
filesystem, and nfsfilesystem/chase/bin is a component of the user
chase's path.

The truely interesting thing is that if I type 'mount' on DU4CLIENT, I
DO NOT see the filesystem in question in the mount table!

If I kill all of chase's process on DU4CLIENT, the automounter
unsticks and all is well.  If I then try to access the chase directory 
in this filesystem, the DU4CLIENT mounts it  I see this transaction:

18:00:27.725678 DU4CLIENT.1435222468  FREEBSDSERVER.nfs: 168 lookup fh 
234,2/163298400 "chase" (ttl 30, id 9546)
18:00:27.725763 FREEBSDSERVER.nfs  DU4CLIENT.1435222468: reply ok 236 lookup fh 
234,2/163298400 DIR 755 ids 1449/107 sz 1024 nlink 21 rdev 134/57475167 fsid 
86036d005f nodeid 36d005f a/m/ctime 941102013.00 940944335.00 
940944335.00  post dattr: DIR 775 ids 0/107 sz 512 nlink 25 rdev 4/40 fsid 
40028 nodeid 28 a/m/ctime 941134393.00 940612206.00 
940612206.00  (ttl 64, id 16062)


I would guess that the DU4CLIENT has the filehandle cached somewhere,
even though it has unmounted the filesystem.  

My question: Whose fault is this?  Should the FreeBSD server be
ignoring requests to a valid filehandle if the client has not mounted
the FS?  Should it be returning some sort of error?

Thanks,

Drew
--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: odd NFS behaviour with DU 4.F client

1999-10-28 Thread Matthew Dillon

:We have an NFS server setup running an older FreeBSD-current (Wed Jun 30).
:This server exports a filesystem to a number of heterogenous clients.
:On most clients, this filesystem is automounted.
:
:Occasionally, some random Digital UNIX box running 4.0F will partially
:wedge because it's automounter is blocked accessing the FreeBSD

Lots of bugs have been fixed since then.  I recommend upgrading the
server (despite the hassle) and seeing if the problem still occurs.

:server's filesystem.  Any access to automounted directories will then
:cause a process to hang.
:
:I've noticed that if I do a tcpdump on the FreeBSD NFS server, I see:
:
:17:48:16.397101 DU4CLIENT.1946927300  FREEBSDSERVER.nfs: 188 lookup fh 
:234,2/163298400 "chase" (ttl 30, id 4256)
:17:48:36.397144 DU4CLIENT.1946927300  FREEBSDSERVER.nfs: 188 lookup fh 
:234,2/163298400 "chase" (ttl 30, id 4310)
:17:48:56.397212 DU4CLIENT.1946927300  FREEBSDSERVER.nfs: 188 lookup fh 
:234,2/163298400 "chase" (ttl 30, id 4384)
:17:49:16.397123 DU4CLIENT.1946927300  FREEBSDSERVER.nfs: 188 lookup fh 
:234,2/163298400 "chase" (ttl 30, id 4453)
:
:These requests go on seemingly forever with no reply from the FreeBSD
:NFS server.  "chase" is a users' directory in the top level of this
:filesystem, and nfsfilesystem/chase/bin is a component of the user
:chase's path.
:...
:The truely interesting thing is that if I type 'mount' on DU4CLIENT, I
:DO NOT see the filesystem in question in the mount table!
:
:If I kill all of chase's process on DU4CLIENT, the automounter
:unsticks and all is well.  If I then try to access the chase directory 

Well, there was a bug in nfsrv_create() which caused the server to
not reply to an NFS packet.  This led to a general revamping of the
server side code which may have fixed other rpc's at the same time.
Whether fixing that bug solves the problem you are having or not is 
unknown.

:I would guess that the DU4CLIENT has the filehandle cached somewhere,
:even though it has unmounted the filesystem.  
:
:My question: Whose fault is this?  Should the FreeBSD server be
:ignoring requests to a valid filehandle if the client has not mounted
:the FS?  Should it be returning some sort of error?
:
:Thanks,
:
:Drew
 
There should be a response to the rpc either way so my guess is that
it is a server-side bug.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: odd NFS behaviour with DU 4.F client

1999-10-28 Thread Andrew Gallatin


Matthew Dillon writes:
  :We have an NFS server setup running an older FreeBSD-current (Wed Jun 30).
  :This server exports a filesystem to a number of heterogenous clients.
  :On most clients, this filesystem is automounted.
  :
  :Occasionally, some random Digital UNIX box running 4.0F will partially
  :wedge because it's automounter is blocked accessing the FreeBSD
  
  Lots of bugs have been fixed since then.  I recommend upgrading the
  server (despite the hassle) and seeing if the problem still occurs.

..

OK, will do.   I'm mainly waiting for the next rev of the ata driver
The volume this box serves up is a ccd stripe of 4 18GB ide disks
attached to multiple Promise controllers.

  There should be a response to the rpc either way so my guess is that
  it is a server-side bug.

OK, thanks.   Good to know.

Speaking of NFS changes, there was talk at one time about turning the
nfsm macros into functions.  Is this going to happen?

I ask because I've seen occasional unaligned access panics on
FreeBSD/alpha in the client side code.  I've only seen them on a
really lossy link (basically a misconfigured duplex on a 100Mb link).
They tend to be in nfs_request (nfs/nfs_socket.c:110) or nfs_readrpc
(nfs/nfs_vnops.c:1093).  These are both calls to nfs macros that would
be a lot easier to debug if they weren't macros ;-)

Thanks,

Drew
--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: odd NFS behaviour with DU 4.F client

1999-10-28 Thread Matthew Dillon

:Speaking of NFS changes, there was talk at one time about turning the
:nfsm macros into functions.  Is this going to happen?

No.  The nfsm macros have goto's all over the place that jump outside
the macro, and also use local variables declared outside the macro. 
Short of rewriting a fairly large hunk of the NFS code entirely it
aint gonna happen.  Nobody is contemplating rewriting the code.

:I ask because I've seen occasional unaligned access panics on
:FreeBSD/alpha in the client side code.  I've only seen them on a
:really lossy link (basically a misconfigured duplex on a 100Mb link).
:They tend to be in nfs_request (nfs/nfs_socket.c:110) or nfs_readrpc
:(nfs/nfs_vnops.c:1093).  These are both calls to nfs macros that would
:be a lot easier to debug if they weren't macros ;-)
:
:Thanks,
:
:Drew
:--
:Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin

You can use gdb to disassemble the code to locate the exact point where
the panic occured.  It is definitely more difficult, but there isn't
much we can do about it.  The rpc design tends to keep things aligned
and NFS packet elements tend to be sized such that alignment remains 
intact, so if these panics can be tracked down the fixes should be 
relatively easy to make.  Unfortunately, we just don't see these sorts
of panics on Intel boxes all that much because IA32 allows misaligned
accesses.  This means there are almost certainly alignment bugs in the
code.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: odd NFS behaviour with DU 4.F client

1999-10-28 Thread Andrew Gallatin


Matthew Dillon writes:
  :Speaking of NFS changes, there was talk at one time about turning the
  :nfsm macros into functions.  Is this going to happen?
  
  No.  The nfsm macros have goto's all over the place that jump outside
  the macro, and also use local variables declared outside the macro. 
  Short of rewriting a fairly large hunk of the NFS code entirely it
  aint gonna happen.  Nobody is contemplating rewriting the code.

Exactly why I wasn't going to try to do it myself ;-)

But I could have sworn I read somewhere that somebody was planning
it.  Oh well.

  You can use gdb to disassemble the code to locate the exact point where
  the panic occured.  It is definitely more difficult, but there isn't
  much we can do about it.  The rpc design tends to keep things aligned
  and NFS packet elements tend to be sized such that alignment remains 
  intact, so if these panics can be tracked down the fixes should be 
  relatively easy to make.  Unfortunately, we just don't see these sorts
  of panics on Intel boxes all that much because IA32 allows misaligned
  accesses.  This means there are almost certainly alignment bugs in the
  code.
  
   -Matt

I'm all in favor of having all the developers have alphas so these
things get caught early ;-)


Cheers,
Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message