Re: Multiple NFS server problems with Solaris 8 clients
Le 2001-10-25, BSD User écrivait : On Wed, 24 Oct 2001, Paul van der Zwan wrote: I have looked at a trace I made using snoop and it shows an NFS_ACL call which [...] It looks like an implementation error in the -current NFS server. I have been digging at traces of 4.4-RELEASE (which works) and -current (which doesn't). Both versions get it wrong. I have no idea why 4.4-RELEASE worked. Thanks for this information! I have opened a PR on that problem earlier yesterday: kern/31479. -- Thomas Quinot ** Département Informatique Réseaux ** [EMAIL PROTECTED] ENST // 46 rue Barrault // 75634 PARIS CEDEX 13 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
In message [EMAIL PROTECTED], BSD User writes: Actually, upon instrumenting some code, it looks like RELEASE-4.4 gets it mostly right. It ejects a PROG_UNAVAIL call which causes the Solaris 8 client to back off. The correct message would seem to be PROC_UNAVAIL, but I would take PROG_UNAVAIL if I could get -current to eject it. I think PROG_UNAVAIL is correct; the packet trace that Thomas provided shows an RPC request with a program ID of 100227 which is not the NFS program ID. Try the patch below. Peter's NFS revamp changed the semantics of the nfsm_reply() macro, and nfsrv_noop() was not updated to match. Previously nfsm_reply would set 'error' to 0 when nd-nd_flag did not have ND_NFSV3 set, and much of the code that uses nfsrv_noop to generate errors ensured that nd-nd_flag was zero. Now nfsm_reply never sets 'error' to 0, so it needs to be done explicitly. Server op functions must return 0 in order for a reply to be sent to the client. Ian Index: nfs_serv.c === RCS file: /home/iedowse/CVS/src/sys/nfsserver/nfs_serv.c,v retrieving revision 1.107 diff -u -r1.107 nfs_serv.c --- nfs_serv.c 2001/09/28 04:37:08 1.107 +++ nfs_serv.c 2001/10/25 16:19:33 @@ -4000,6 +4000,7 @@ else error = EPROCUNAVAIL; nfsm_reply(0); + error = 0; nfsmout: return (error); } To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
Le 2001-10-25, Ian Dowse écrivait : I think PROG_UNAVAIL is correct; the packet trace that Thomas provided shows an RPC request with a program ID of 100227 which is not the NFS program ID. Yep. (Incidentally 100227 appears in /etc/rpc as 'nfs_acl'). Try the patch below. Seems to work. Thanks! Thomas. -- Thomas Quinot ** Département Informatique Réseaux ** [EMAIL PROTECTED] ENST // 46 rue Barrault // 75634 PARIS CEDEX 13 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
In message [EMAIL PROTECTED], BSD User wrote: Actually, upon instrumenting some code, it looks like RELEASE-4.4 gets it mostly right. It ejects a PROG_UNAVAIL call which causes the Solaris 8 client to back off. The correct message would seem to be PROC_UNAVAIL, but I would take PROG_UNAVAIL if I could get -current to eject it. In this case ( the NFS_ACL one) it seems PROG_UNAVAIL is the right thing. It has a different program number from NFS and it is not just a not implemented procedure that is part of NFS. Paul -- Paul van der Zwan paulz @ trantor.xs4all.nl I think I'll move to theory, everything works in theory... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
On Thu, 25 Oct 2001, Ian Dowse wrote: I think PROG_UNAVAIL is correct; the packet trace that Thomas provided shows an RPC request with a program ID of 100227 which is not the NFS program ID. I stand corrected. It does indeed attempt to access a different program. Try the patch below. The patch works. Now I can get back to working out the rpc_lockd subsystem. As a side note, is anyone from the FreeBSD side of the world taking a box to Connectathon to do some NFS testing? -a To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
In message [EMAIL PROTECTED], Thomas Quinot wrote: Le 2001-10-14, Paul van der Zwan écrivait : I am using -current box as a homedir server for my Solaris clients and have noticed a wierd problem. Other problems here, with Solaris 2.[68] as clients, and -CURRENT of yesterday as server. ls works, but ls -l issues a 'NFS getacl failed' message *and* waits for a timeout once for each file in the directory. The server is not multi-homed, and a packet capture shows no trace of address mismatch problems. One interesting thing is that the client first does GETATTR on the file (and apparently gets a reply), and then sends some other RPC, to which the server never replies. Could this be the getacl request mentioned in the client error message? I see no mention of getacl whatsoever in the -CURRENT server code. If no such function is implemented, shouldn't we reject the request? A packet capture is available at http://www.infres.enst.fr/~quinot/nfs.cap Client is 137.194.192.1, server is 137.194.162.11. The test consists in first performing an 'ls' on one file, then an 'ls -l' on the same file. Result: ls photos-ta; ls -l photos-ta photos-ta NFS getacl failed for server shalmaneser.enst.fr: error 5 (RPC: Timed out) -rw--- 1 quinot astre474 Oct 18 14:17 photos-ta I have looked at a trace I made using snoop and it shows an NFS_ACL call which is not supported by FreeBSD. It should have sent a reply that it does not know the NFS_ACL protocol but apparently it does not. The only return traffic I see is an empty packet with the tcp ACK. It looks like an implementation error in the -current NFS server. Paul -- Paul van der Zwan paulz @ trantor.xs4all.nl I think I'll move to theory, everything works in theory... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
On Wed, 24 Oct 2001, Paul van der Zwan wrote: I have looked at a trace I made using snoop and it shows an NFS_ACL call which is not supported by FreeBSD. It should have sent a reply that it does not know the NFS_ACL protocol but apparently it does not. The only return traffic I see is an empty packet with the tcp ACK. It looks like an implementation error in the -current NFS server. Paul I have been digging at traces of 4.4-RELEASE (which works) and -current (which doesn't). Both versions get it wrong. I have no idea why 4.4-RELEASE worked. -current responds with a blank TCP packet (which it emphatically should *not* do) to the GETACL3 call. It *could* conceivably be received as an RPC packet with the Last Fragment flag not set and a length of 0. Who knows what the Solaris 8 client is doing when it encounters this (probably getting stuck waiting for more data which never comes). 4.4-RELEASE responds with an RPC packet indicating success (which is *also* wrong if the NFS server doesn't support ACLs) and then puts what looks to be garbage in the response. However, it is a valid RPC reponse with the Last Fragment flag set. Presumably the Solaris client gets the message, sees the last fragment, throws away the packet as an error and continues on with life. I presume that the correct response is to send back an RPC reply (with the Last Fragment set) which indicates that the RPC message was accepted but that the procedure was unavailable (PROC_UNAVAIL). Hopefully this matches what an older Solaris server would do when faced with a Solaris 8 client and everything will proceed normally from there. If anybody wants ethereal traces, I can send them. Just ask. Andy L. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
Le 2001-10-14, Paul van der Zwan écrivait : I am using -current box as a homedir server for my Solaris clients and have noticed a wierd problem. Other problems here, with Solaris 2.[68] as clients, and -CURRENT of yesterday as server. ls works, but ls -l issues a 'NFS getacl failed' message *and* waits for a timeout once for each file in the directory. The server is not multi-homed, and a packet capture shows no trace of address mismatch problems. One interesting thing is that the client first does GETATTR on the file (and apparently gets a reply), and then sends some other RPC, to which the server never replies. Could this be the getacl request mentioned in the client error message? I see no mention of getacl whatsoever in the -CURRENT server code. If no such function is implemented, shouldn't we reject the request? A packet capture is available at http://www.infres.enst.fr/~quinot/nfs.cap Client is 137.194.192.1, server is 137.194.162.11. The test consists in first performing an 'ls' on one file, then an 'ls -l' on the same file. Result: ls photos-ta; ls -l photos-ta photos-ta NFS getacl failed for server shalmaneser.enst.fr: error 5 (RPC: Timed out) -rw--- 1 quinot astre474 Oct 18 14:17 photos-ta Thomas. -- Thomas Quinot ** Département Informatique Réseaux ** [EMAIL PROTECTED] ENST // 46 rue Barrault // 75634 PARIS CEDEX 13 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Multiple NFS server problems with Solaris 8 clients
I am using -current box as a homedir server for my Solaris clients and have noticed a wierd problem. When I login my homedir gets mounted ok but when I type ls -l it just waits until I ^C it. If I run snoop on Solaris I see a getattr request being sent and an answer being received but apparently it gets ignored by Solaris. This happens on both Sol x86 and Sparc ( both with MU5 installed) Another problem I see is that rebooting the client causes the server to ignore request afterwards. I see SYNS sent to the server but no respons at all... One more problem is in nfsd, if I set it to use udp only it starts eating all cpu cycles it can get,but only the master process. Trussing the proces shows no system calls whatsoever being performed. BTW This is -current built yesterday ( oct 13). Paul PS Snoop logs or tcpdump logs are avialable for those who know what to look for... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
Paul van der Zwan wrote: If I run snoop on Solaris I see a getattr request being sent and an answer being received but apparently it gets ignored by Solaris. This happens on both Sol x86 and Sparc ( both with MU5 installed) Please do a tcpdump, and examine it; I suspect you will find that your problem is that the IP address it was sent to is not the same as the IP address it was replied from. In general, this is because the code doesn't explicitly use recvfrom/sendto semantics, and just takes the route. This will most often occur when you mount it using an IP alias, but the primary (non-alias) IP address is is on the same subnet as the alias. It can also occur if you are using two address sets on the same wire, and do not use an intervening router. Another problem I see is that rebooting the client causes the server to ignore request afterwards. I see SYNS sent to the server but no respons at all... Again, you will need to tcpdump it. One prospect is for the ARP table to be different on the who has after the reboot. I've noticed that a ping socket gets a route, and even after an ICMP redirect, I still get a bunch of redirects, since FreeBSD does not update the route table for already created clones (this is a bug in FreeBSD's routing code). Another possibility is the reboot reset the sequence number; a common thing is to ensure that the random sequence number used is later than the one that was used last for the same IP/port pairs. The client will most likely reuse the same numbers, or lower numbers, even if it is RFC compliant as to non-guessable sequence numbers (you will see this on the tcpdump). FreeBSD will not guarantee increasing sequence numbers -- and will thus ignore the packets -- unless you enable the sysctl to disable the pure random sequence nu,mber hack. Look for it via the command sysctl -A | grep -i seq. NB: FreeBSD also does not reset connections in TIME_WAIT, if it gets packets from the same IP/port on the client while the server is in TIME_WAIT because the connections are dead. This is a common hack (NT does this by default, and so does Solaris), but it opens you up for connection force-down attacks for active connections, if your network is improperly firewalled. One more problem is in nfsd, if I set it to use udp only it starts eating all cpu cycles it can get,but only the master process. Trussing the proces shows no system calls whatsoever being performed. The I/O daemons make a system call and never return to user space. To track down this problem, truss is of no use: you must use DDB in the kernel (or remote kernel debugging, if you have two systems available: see the FreeBSD Developer's Handbook), and find out what it's doing in the kernel when this happens... I suspect that you are having one of the problems above, and are being packet-flooded by the clients, when they get no response, or at least none they like, from the server. BTW This is -current built yesterday ( oct 13). You may also want to try 4.3 or 4.4 instead. PS Snoop logs or tcpdump logs are avialable for those who know what to look for... I'll look at them if they are up on a web site, but not if you mail them, so _DON'T_ mail them to me! -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
Actually, I've also noticed problems in FreeBSD-current also- ls and reads work, but things like mkdir hang. Here's the tcpdump output: Script started on Sun Oct 14 12:21:50 2001 quarm.feral.com root tcpdump -vv -i fxp0 host antares tcpdump: listening on fxp0 12:21:58.498568 antares.1294025654 quarm.nfs: 116 getattr [|nfs] (DF) (ttl 64, id 2722, len 156) 12:21:58.498746 quarm.nfs antares.1294025654: reply ok 116 getattr [|nfs] (DF) (ttl 64, id 29331, len 156) 12:21:58.501021 antares.1294025655 quarm.nfs: 116 getattr [|nfs] (DF) (ttl 64, id 2723, len 156) 12:21:58.501184 quarm.nfs antares.1294025655: reply ok 116 getattr [|nfs] (DF) (ttl 64, id 29332, len 156) 12:21:58.501657 antares.1294025656 quarm.nfs: 116 getattr [|nfs] (DF) (ttl 64, id 2724, len 156) 12:21:58.501707 quarm.nfs antares.1294025656: reply ok 116 getattr [|nfs] (DF) (ttl 64, id 29333, len 156) 12:21:58.502062 antares.1294025657 quarm.nfs: 116 getattr [|nfs] (DF) (ttl 64, id 2725, len 156) 12:21:58.502117 quarm.nfs antares.1294025657: reply ok 116 getattr [|nfs] (DF) (ttl 64, id 29334, len 156) 12:21:58.502475 antares.1294025658 quarm.nfs: 116 getattr [|nfs] (DF) (ttl 64, id 2726, len 156) 12:21:58.502519 quarm.nfs antares.1294025658: reply ok 116 getattr [|nfs] (DF) (ttl 64, id 29335, len 156) 12:21:58.598618 antares.1018 quarm.nfsd: . [tcp sum ok] 437975440:437975440(0) ack 4039870942 win 24820 (DF) (ttl 64, id 2727, len 40) - OKAY- that was the ls that workes 12:22:10.893273 antares.1294025660 quarm.nfs: 116 getattr [|nfs] (DF) (ttl 64, id 2728, len 156) 12:22:10.893409 quarm.nfs antares.1294025660: reply ok 116 getattr [|nfs] (DF) (ttl 64, id 29367, len 156) 12:22:10.893740 antares.1294025661 quarm.nfs: 120 getattr [|nfs] (DF) (ttl 64, id 2729, len 160) 12:22:10.992986 quarm.nfsd antares.1018: . [tcp sum ok] 117:117(0) ack 236 win 62459 (DF) (ttl 64, id 29368, len 40) - that was the mkdir (that hung) ^C 218 packets received by filter 0 packets dropped by kernel To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
Hi, One more problem is in nfsd, if I set it to use udp only it starts eating all cpu cycles it can get,but only the master process. Trussing the process shows no system calls whatsoever being performed. The last one is a know problem. There is a (unfinished) patch available to solve this problem. Thomas Moestl [EMAIL PROTECTED] is still working on some issues of the patch. Please contact him if you like to know more. Here is the URL for the patch: http://home.teleport.ch/freebsd/userland/nfsd-loop.diff Martin Martin Blapp, [EMAIL PROTECTED] -- Improware AG, UNIX solution and service provider Zurlindenstrasse 29, 4133 Pratteln, Switzerland Phone: +41 061 826 93 00: +41 61 826 93 01 PGP Fingerprint: 57E 7CCD 2769 E7AC C5FA DF2C 19C6 DCD1 1B3A EC9C -- To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Multiple NFS server problems with Solaris 8 clients
The last one is a know problem. There is a (unfinished) patch available to solve this problem. Thomas Moestl [EMAIL PROTECTED] is still working on some issues of the patch. Please contact him if you like to know more. Here is the URL for the patch: http://home.teleport.ch/freebsd/userland/nfsd-loop.diff That patch is a bit out of date, because Peter removed a big chunk of kerberos code from nfsd since. I was actually just looking at this problem again, so I include an updated version of Thomas's patch below. This version also removes entries from the children[] array when a slave nfsd dies to avoid the possibility of accidentally killing unrelated processes. The issue that remains open with the patch is that currently if a slave nfsd dies, then all nfsds will shut down. This is because nfssvc() in the master nfsd returns 0 when the master nfsd receives a SIGCHLD. This behaviour is probably reasonable enough, but the way it happens is a bit odd. Thomas, I'll probably commit this within the next few days if you have no objections, and if you don't get there before me. The exiting behaviour can be resolved later if necessary. Ian Index: nfsd.c === RCS file: /dump/FreeBSD-CVS/src/sbin/nfsd/nfsd.c,v retrieving revision 1.21 diff -u -r1.21 nfsd.c --- nfsd.c 20 Sep 2001 02:18:06 - 1.21 +++ nfsd.c 14 Oct 2001 20:19:18 - @@ -52,6 +52,8 @@ #include sys/syslog.h #include sys/wait.h #include sys/mount.h +#include sys/linker.h +#include sys/module.h #include rpc/rpc.h #include rpc/pmap_clnt.h @@ -64,6 +66,7 @@ #include err.h #include errno.h +#include signal.h #include stdio.h #include stdlib.h #include strings.h @@ -86,12 +89,16 @@ intnfsdcnt;/* number of children */ void cleanup(int); +void child_cleanup(int); void killchildren(void); -void nonfs (int); -void reapchild (int); -intsetbindhost (struct addrinfo **ia, const char *bindhost, struct addrinfo hints); -void unregistration (void); -void usage (void); +void nfsd_exit(int); +void nonfs(int); +void reapchild(int); +intsetbindhost(struct addrinfo **ia, const char *bindhost, + struct addrinfo hints); +void start_server(int); +void unregistration(void); +void usage(void); /* * Nfs server daemon mostly just a user context for nfssvc() @@ -126,13 +133,12 @@ fd_set ready, sockbits; fd_set v4bits, v6bits; int ch, connect_type_cnt, i, len, maxsock, msgsock; - int nfssvc_flag, on = 1, unregister, reregister, sock; + int on = 1, unregister, reregister, sock; int tcp6sock, ip6flag, tcpflag, tcpsock; - int udpflag, ecode, s; - int bindhostc = 0, bindanyflag, rpcbreg, rpcbregcnt; + int udpflag, ecode, s, srvcnt; + int bindhostc, bindanyflag, rpcbreg, rpcbregcnt; char **bindhost = NULL; pid_t pid; - int error; if (modfind(nfsserver) 0) { /* Not present in kernel, try loading it */ @@ -141,8 +147,8 @@ } nfsdcnt = DEFNFSDCNT; - unregister = reregister = tcpflag = 0; - bindanyflag = udpflag; + unregister = reregister = tcpflag = maxsock = 0; + bindanyflag = udpflag = connect_type_cnt = bindhostc = 0; #defineGETOPT ah:n:rdtu #defineUSAGE [-ardtu] [-n num_servers] [-h bindip] while ((ch = getopt(argc, argv, GETOPT)) != -1) @@ -313,8 +319,6 @@ daemon(0, 0); (void)signal(SIGHUP, SIG_IGN); (void)signal(SIGINT, SIG_IGN); - (void)signal(SIGSYS, nonfs); - (void)signal(SIGUSR1, cleanup); /* * nfsd sits in the kernel most of the time. It needs * to ignore SIGTERM/SIGQUIT in order to stay alive as long @@ -324,40 +328,31 @@ (void)signal(SIGTERM, SIG_IGN); (void)signal(SIGQUIT, SIG_IGN); } + (void)signal(SIGSYS, nonfs); (void)signal(SIGCHLD, reapchild); - openlog(nfsd:, LOG_PID, LOG_DAEMON); + openlog(nfsd, LOG_PID, LOG_DAEMON); - for (i = 0; i nfsdcnt; i++) { + /* If we use UDP only, we start the last server below. */ + srvcnt = tcpflag ? nfsdcnt : nfsdcnt - 1; + for (i = 0; i srvcnt; i++) { switch ((pid = fork())) { case -1: syslog(LOG_ERR, fork: %m); - killchildren(); - exit (1); + nfsd_exit(1); case 0: break; default: children[i] = pid; continue; } - + (void)signal(SIGUSR1, child_cleanup); setproctitle(server); - nfssvc_flag = NFSSVC_NFSD; - nsd.nsd_nfsd = NULL; - while (nfssvc(nfssvc_flag,
Re: Multiple NFS server problems with Solaris 8 clients
On Sun, 2001/10/14 at 21:38:26 +0100, Ian Dowse wrote: The last one is a know problem. There is a (unfinished) patch available to solve this problem. Thomas Moestl [EMAIL PROTECTED] is still working on some issues of the patch. Please contact him if you like to know more. Here is the URL for the patch: http://home.teleport.ch/freebsd/userland/nfsd-loop.diff That patch is a bit out of date, because Peter removed a big chunk of kerberos code from nfsd since. I was actually just looking at this problem again, so I include an updated version of Thomas's patch below. This version also removes entries from the children[] array when a slave nfsd dies to avoid the possibility of accidentally killing unrelated processes. The issue that remains open with the patch is that currently if a slave nfsd dies, then all nfsds will shut down. This is because nfssvc() in the master nfsd returns 0 when the master nfsd receives a SIGCHLD. This behaviour is probably reasonable enough, but the way it happens is a bit odd. Thomas, I'll probably commit this within the next few days if you have no objections, and if you don't get there before me. The exiting behaviour can be resolved later if necessary. Thanks! I've been meaning to update and commit this patch for quite some time, but was rather focused on sparc64 development recently when I had time. I also wanted to resolve this exiting behaviour before, but I agree that it is probably not a real issue. - thomas To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message