Re: odd NFS behaviour with DU 4.F client
Matthew Dillon writes: Well, there was a bug in nfsrv_create() which caused the server to not reply to an NFS packet. This led to a general revamping of the server side code which may have fixed other rpc's at the same time. Whether fixing that bug solves the problem you are having or not is unknown. :I would guess that the DU4CLIENT has the filehandle cached somewhere, :even though it has unmounted the filesystem. : :My question: Whose fault is this? Should the FreeBSD server be :ignoring requests to a valid filehandle if the client has not mounted :the FS? Should it be returning some sort of error? : :Thanks, : :Drew There should be a response to the rpc either way so my guess is that it is a server-side bug. It turns out that the user was in 17 groups (DU supports up to 32). After I removed him from 2 groups got his group count down to 15, all was well. After I upgrade the NFS server to a more recent -current, I'll test this again with a user in 17 groups. Thanks again, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: odd NFS behaviour with DU 4.F client
: : : :Thanks, : : : :Drew : : There should be a response to the rpc either way so my guess is that : it is a server-side bug. : :It turns out that the user was in 17 groups (DU supports up to 32). :After I removed him from 2 groups got his group count down to 15, :all was well. : :After I upgrade the NFS server to a more recent -current, I'll test :this again with a user in 17 groups. : :Thanks again, : :Drew Ahhh... I'm glad you found it. I was beginning to scratch my head. NGROUPS_MAX is set to 16 (/usr/src/sys/sys/syslimits.h). You may be able to patch the kernel to up the number of groups by upping the value in that define and recompiling the kernel. I've never tried this myself but it should work. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: odd NFS behaviour with DU 4.F client
Matthew Dillon writes: Ahhh... I'm glad you found it. I was beginning to scratch my head. NGROUPS_MAX is set to 16 (/usr/src/sys/sys/syslimits.h). You may be able to patch the kernel to up the number of groups by upping the value in that define and recompiling the kernel. I've never tried this myself but it should work. Will a recent -current behave the same way, or will it return something to the DU box? Eg, will I need to worry about uppting NGROUPS_MAX when I upgrade the box to a more recent kernel? Thanks, Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: odd NFS behaviour with DU 4.F client
:Matthew Dillon writes: : : Ahhh... I'm glad you found it. I was beginning to scratch my head. : : NGROUPS_MAX is set to 16 (/usr/src/sys/sys/syslimits.h). You may : be able to patch the kernel to up the number of groups by upping : the value in that define and recompiling the kernel. I've never : tried this myself but it should work. : :Will a recent -current behave the same way, or will it return :something to the DU box? : :Eg, will I need to worry about uppting NGROUPS_MAX when I upgrade the :box to a more recent kernel? : :Thanks, : :Drew :-- :Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin :Duke UniversityEmail: [EMAIL PROTECTED] :Department of Computer Science Phone: (919) 660-6590 I don't know, I don't have a non-FreeBSD box to test with. I would be interested in knowing the answer! -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
odd NFS behaviour with DU 4.F client
We have an NFS server setup running an older FreeBSD-current (Wed Jun 30). This server exports a filesystem to a number of heterogenous clients. On most clients, this filesystem is automounted. Occasionally, some random Digital UNIX box running 4.0F will partially wedge because it's automounter is blocked accessing the FreeBSD server's filesystem. Any access to automounted directories will then cause a process to hang. I've noticed that if I do a tcpdump on the FreeBSD NFS server, I see: 17:48:16.397101 DU4CLIENT.1946927300 FREEBSDSERVER.nfs: 188 lookup fh 234,2/163298400 "chase" (ttl 30, id 4256) 17:48:36.397144 DU4CLIENT.1946927300 FREEBSDSERVER.nfs: 188 lookup fh 234,2/163298400 "chase" (ttl 30, id 4310) 17:48:56.397212 DU4CLIENT.1946927300 FREEBSDSERVER.nfs: 188 lookup fh 234,2/163298400 "chase" (ttl 30, id 4384) 17:49:16.397123 DU4CLIENT.1946927300 FREEBSDSERVER.nfs: 188 lookup fh 234,2/163298400 "chase" (ttl 30, id 4453) These requests go on seemingly forever with no reply from the FreeBSD NFS server. "chase" is a users' directory in the top level of this filesystem, and nfsfilesystem/chase/bin is a component of the user chase's path. The truely interesting thing is that if I type 'mount' on DU4CLIENT, I DO NOT see the filesystem in question in the mount table! If I kill all of chase's process on DU4CLIENT, the automounter unsticks and all is well. If I then try to access the chase directory in this filesystem, the DU4CLIENT mounts it I see this transaction: 18:00:27.725678 DU4CLIENT.1435222468 FREEBSDSERVER.nfs: 168 lookup fh 234,2/163298400 "chase" (ttl 30, id 9546) 18:00:27.725763 FREEBSDSERVER.nfs DU4CLIENT.1435222468: reply ok 236 lookup fh 234,2/163298400 DIR 755 ids 1449/107 sz 1024 nlink 21 rdev 134/57475167 fsid 86036d005f nodeid 36d005f a/m/ctime 941102013.00 940944335.00 940944335.00 post dattr: DIR 775 ids 0/107 sz 512 nlink 25 rdev 4/40 fsid 40028 nodeid 28 a/m/ctime 941134393.00 940612206.00 940612206.00 (ttl 64, id 16062) I would guess that the DU4CLIENT has the filehandle cached somewhere, even though it has unmounted the filesystem. My question: Whose fault is this? Should the FreeBSD server be ignoring requests to a valid filehandle if the client has not mounted the FS? Should it be returning some sort of error? Thanks, Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: odd NFS behaviour with DU 4.F client
:We have an NFS server setup running an older FreeBSD-current (Wed Jun 30). :This server exports a filesystem to a number of heterogenous clients. :On most clients, this filesystem is automounted. : :Occasionally, some random Digital UNIX box running 4.0F will partially :wedge because it's automounter is blocked accessing the FreeBSD Lots of bugs have been fixed since then. I recommend upgrading the server (despite the hassle) and seeing if the problem still occurs. :server's filesystem. Any access to automounted directories will then :cause a process to hang. : :I've noticed that if I do a tcpdump on the FreeBSD NFS server, I see: : :17:48:16.397101 DU4CLIENT.1946927300 FREEBSDSERVER.nfs: 188 lookup fh :234,2/163298400 "chase" (ttl 30, id 4256) :17:48:36.397144 DU4CLIENT.1946927300 FREEBSDSERVER.nfs: 188 lookup fh :234,2/163298400 "chase" (ttl 30, id 4310) :17:48:56.397212 DU4CLIENT.1946927300 FREEBSDSERVER.nfs: 188 lookup fh :234,2/163298400 "chase" (ttl 30, id 4384) :17:49:16.397123 DU4CLIENT.1946927300 FREEBSDSERVER.nfs: 188 lookup fh :234,2/163298400 "chase" (ttl 30, id 4453) : :These requests go on seemingly forever with no reply from the FreeBSD :NFS server. "chase" is a users' directory in the top level of this :filesystem, and nfsfilesystem/chase/bin is a component of the user :chase's path. :... :The truely interesting thing is that if I type 'mount' on DU4CLIENT, I :DO NOT see the filesystem in question in the mount table! : :If I kill all of chase's process on DU4CLIENT, the automounter :unsticks and all is well. If I then try to access the chase directory Well, there was a bug in nfsrv_create() which caused the server to not reply to an NFS packet. This led to a general revamping of the server side code which may have fixed other rpc's at the same time. Whether fixing that bug solves the problem you are having or not is unknown. :I would guess that the DU4CLIENT has the filehandle cached somewhere, :even though it has unmounted the filesystem. : :My question: Whose fault is this? Should the FreeBSD server be :ignoring requests to a valid filehandle if the client has not mounted :the FS? Should it be returning some sort of error? : :Thanks, : :Drew There should be a response to the rpc either way so my guess is that it is a server-side bug. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: odd NFS behaviour with DU 4.F client
Matthew Dillon writes: :We have an NFS server setup running an older FreeBSD-current (Wed Jun 30). :This server exports a filesystem to a number of heterogenous clients. :On most clients, this filesystem is automounted. : :Occasionally, some random Digital UNIX box running 4.0F will partially :wedge because it's automounter is blocked accessing the FreeBSD Lots of bugs have been fixed since then. I recommend upgrading the server (despite the hassle) and seeing if the problem still occurs. .. OK, will do. I'm mainly waiting for the next rev of the ata driver The volume this box serves up is a ccd stripe of 4 18GB ide disks attached to multiple Promise controllers. There should be a response to the rpc either way so my guess is that it is a server-side bug. OK, thanks. Good to know. Speaking of NFS changes, there was talk at one time about turning the nfsm macros into functions. Is this going to happen? I ask because I've seen occasional unaligned access panics on FreeBSD/alpha in the client side code. I've only seen them on a really lossy link (basically a misconfigured duplex on a 100Mb link). They tend to be in nfs_request (nfs/nfs_socket.c:110) or nfs_readrpc (nfs/nfs_vnops.c:1093). These are both calls to nfs macros that would be a lot easier to debug if they weren't macros ;-) Thanks, Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: odd NFS behaviour with DU 4.F client
:Speaking of NFS changes, there was talk at one time about turning the :nfsm macros into functions. Is this going to happen? No. The nfsm macros have goto's all over the place that jump outside the macro, and also use local variables declared outside the macro. Short of rewriting a fairly large hunk of the NFS code entirely it aint gonna happen. Nobody is contemplating rewriting the code. :I ask because I've seen occasional unaligned access panics on :FreeBSD/alpha in the client side code. I've only seen them on a :really lossy link (basically a misconfigured duplex on a 100Mb link). :They tend to be in nfs_request (nfs/nfs_socket.c:110) or nfs_readrpc :(nfs/nfs_vnops.c:1093). These are both calls to nfs macros that would :be a lot easier to debug if they weren't macros ;-) : :Thanks, : :Drew :-- :Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin You can use gdb to disassemble the code to locate the exact point where the panic occured. It is definitely more difficult, but there isn't much we can do about it. The rpc design tends to keep things aligned and NFS packet elements tend to be sized such that alignment remains intact, so if these panics can be tracked down the fixes should be relatively easy to make. Unfortunately, we just don't see these sorts of panics on Intel boxes all that much because IA32 allows misaligned accesses. This means there are almost certainly alignment bugs in the code. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: odd NFS behaviour with DU 4.F client
Matthew Dillon writes: :Speaking of NFS changes, there was talk at one time about turning the :nfsm macros into functions. Is this going to happen? No. The nfsm macros have goto's all over the place that jump outside the macro, and also use local variables declared outside the macro. Short of rewriting a fairly large hunk of the NFS code entirely it aint gonna happen. Nobody is contemplating rewriting the code. Exactly why I wasn't going to try to do it myself ;-) But I could have sworn I read somewhere that somebody was planning it. Oh well. You can use gdb to disassemble the code to locate the exact point where the panic occured. It is definitely more difficult, but there isn't much we can do about it. The rpc design tends to keep things aligned and NFS packet elements tend to be sized such that alignment remains intact, so if these panics can be tracked down the fixes should be relatively easy to make. Unfortunately, we just don't see these sorts of panics on Intel boxes all that much because IA32 allows misaligned accesses. This means there are almost certainly alignment bugs in the code. -Matt I'm all in favor of having all the developers have alphas so these things get caught early ;-) Cheers, Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message