Re: multihomed nfs server - NLM lock failure on additional interfaces

2011-12-13 Thread Rick Macklem
John De wrote:
 Hi Folks,
 
 I have a 9-prerelease system where I've been testing nfs/zfs. The
 system has been working quite well until moving the server to a
 multihomed
 configuration.
 
 Given the following:
 
 nfsd: master (nfsd)
 nfsd: server (nfsd)
 /usr/sbin/rpcbind -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h
 172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h
 10.24.6.34 -h 10.24.6.33
 /usr/sbin/mountd -r -l -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h
 172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h
 10.24.6.34 -h 10.24.6.33
 /usr/sbin/rpc.statd -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h
 172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h
 10.24.6.34 -h 10.24.6.33
 /usr/sbin/rpc.lockd -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h
 172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h
 10.24.6.34 -h 10.24.6.33
 
 10.24.6.38 is the default interface on 1G. The 172 nets are 10G
 connected
 to compute systems.
 
 ifconfig_bce0=' inet 10.24.6.38 netmask 255.255.0.0 -rxcsum -txcsum'
 _c='physical addr which never changes'
 ifconfig_bce1=' inet 172.1.1.2 netmask 255.255.255.0' _c='physcial
 addr on crossover cable'
 ifconfig_cxgb2='inet 172.21.21.129 netmask 255.255.255.0' _c='physical
 backside 10g compute net'
 ifconfig_cxgb3='inet 172.21.201.1 netmask 255.255.255.0 mtu 9000'
 _c='physical backside 10g compute net'
 ifconfig_cxgb6='inet 172.21.202.1 netmask 255.255.255.0 mtu 9000'
 _c='physical backside 10g compute net'
 ifconfig_cxgb8='inet 172.21.203.1 netmask 255.255.255.0 mtu 9000'
 _c='physical backside 10g compute net'
 ifconfig_cxgb4='inet 172.21.204.1 netmask 255.255.255.0 mtu 9000'
 _c='physical backside 10g compute net'
 ifconfig_cxgb0='inet 172.21.205.1 netmask 255.255.255.0 mtu 9000'
 _c='physical backside 10g compute net'
 
 The 10.24.6.34 and 10.24.6.33 are alias addresses for the system.
 
 Destination Gateway Flags Refs Use Netif Expire
 default 10.24.0.1 UGS 0 1049 bce0
 
 
 The server works correctly (and quite well) for both udp  tcp mounts.
 Basically, all nfs traffic is great!
 
 However, locking only works for clients connected to the 10.24.6.38
 interface.
 
 A tcpdump file from good  bad runs:
 
 http://www.freebsd.org/~jwd/lockgood.pcap
 http://www.freebsd.org/~jwd/lockbad.pcap
 
 Basically, the clients (both FreeBSD  Linux) query the servers
 rpcbind
 for the address of the nlm which is returned correctly. For the good
 run, the
 NLM is then called. For the bad call, it is not.
 
Well, first off I think your packet traces are missing packets. If
you look at nlm_get_rpc(), which is the function in sys/nlm/nlm_prot_impl.c
that is doing this, you will see that it first attempts UDP and then falls
back to TCP when talking to rpcbind. Your packet traces only show TCP, so
I suspect that the UDP case went through a different interface (or missed
getting captured some other way?).
My guess would be that the attempt to connect to the server's NLM does the
same thing, since the lockbad.pcap doesn't show any SYN,... to port 844.

If I were you, I'd put lottsa printfs in nlm_get_rpc() showing what is in
the address structure ss and, in particular when it calls 
clnt_reconnect_create().
{ For the client. }

For the server, it starts at sys_nlm_syscall(), which calls ... until
you get to nlm_register_services(). It copies in a list of address(es) and
I would printf those address(es) once copied into the kernel, to see if they
make sense. These are the address(es) that are going to get sobind()'d later
by a function called svn_tli_create() { over is sys/rpc/rpc_generic.c }.

That's as far as I got. Good luck with it, rick

 I've started digging through code, but I do not claim to be an rpc
 expert.
 If anyone has suggestions I would appreciate any pointers.
 
 Thanks!
 John
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


multihomed nfs server - NLM lock failure on additional interfaces

2011-12-12 Thread John
Hi Folks,

   I have a 9-prerelease system where I've been testing nfs/zfs. The
system has been working quite well until moving the server to a multihomed
configuration.

   Given the following:

nfsd: master (nfsd)
nfsd: server (nfsd)
/usr/sbin/rpcbind  -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h 
172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h 10.24.6.34 -h 
10.24.6.33
/usr/sbin/mountd -r -l -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h 
172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h 10.24.6.34 -h 
10.24.6.33
/usr/sbin/rpc.statd-h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h 
172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h 10.24.6.34 -h 
10.24.6.33
/usr/sbin/rpc.lockd-h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h 
172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h 10.24.6.34 -h 
10.24.6.33

   10.24.6.38 is the default interface on 1G. The 172 nets are 10G connected
to compute systems.

ifconfig_bce0=' inet 10.24.6.38netmask 255.255.0.0 -rxcsum -txcsum'  
_c='physical addr which never changes'
ifconfig_bce1=' inet 172.1.1.2 netmask 255.255.255.0'
_c='physcial addr on crossover cable'
ifconfig_cxgb2='inet 172.21.21.129 netmask 255.255.255.0'
_c='physical backside 10g compute net'
ifconfig_cxgb3='inet 172.21.201.1  netmask 255.255.255.0 mtu 9000'   
_c='physical backside 10g compute net'
ifconfig_cxgb6='inet 172.21.202.1  netmask 255.255.255.0 mtu 9000'   
_c='physical backside 10g compute net'
ifconfig_cxgb8='inet 172.21.203.1  netmask 255.255.255.0 mtu 9000'   
_c='physical backside 10g compute net'
ifconfig_cxgb4='inet 172.21.204.1  netmask 255.255.255.0 mtu 9000'   
_c='physical backside 10g compute net'
ifconfig_cxgb0='inet 172.21.205.1  netmask 255.255.255.0 mtu 9000'   
_c='physical backside 10g compute net'

   The 10.24.6.34 and 10.24.6.33 are alias addresses for the system.

DestinationGatewayFlagsRefs  Use  Netif Expire
default10.24.0.1  UGS 0 1049   bce0


   The server works correctly (and quite well) for both udp  tcp mounts.
Basically, all nfs traffic is great!

   However, locking only works for clients connected to the 10.24.6.38
interface.

   A tcpdump file from good  bad runs:

http://www.freebsd.org/~jwd/lockgood.pcap
http://www.freebsd.org/~jwd/lockbad.pcap

   Basically, the clients (both FreeBSD  Linux) query the servers rpcbind
for the address of the nlm which is returned correctly. For the good run, the
NLM is then called. For the bad call, it is not.

   I've started digging through code, but I do not claim to be an rpc expert.
If anyone has suggestions I would appreciate any pointers.

Thanks!
John
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org