John De wrote:
Hi Folks,
I have a 9-prerelease system where I've been testing nfs/zfs. The
system has been working quite well until moving the server to a
multihomed
configuration.
Given the following:
nfsd: master (nfsd)
nfsd: server (nfsd)
/usr/sbin/rpcbind -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h
172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h
10.24.6.34 -h 10.24.6.33
/usr/sbin/mountd -r -l -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h
172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h
10.24.6.34 -h 10.24.6.33
/usr/sbin/rpc.statd -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h
172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h
10.24.6.34 -h 10.24.6.33
/usr/sbin/rpc.lockd -h 10.24.6.38 -h 172.1.1.2 -h 172.21.201.1 -h
172.21.202.1 -h 172.21.203.1 -h 172.21.204.1 -h 172.21.205.1 -h
10.24.6.34 -h 10.24.6.33
10.24.6.38 is the default interface on 1G. The 172 nets are 10G
connected
to compute systems.
ifconfig_bce0=' inet 10.24.6.38 netmask 255.255.0.0 -rxcsum -txcsum'
_c='physical addr which never changes'
ifconfig_bce1=' inet 172.1.1.2 netmask 255.255.255.0' _c='physcial
addr on crossover cable'
ifconfig_cxgb2='inet 172.21.21.129 netmask 255.255.255.0' _c='physical
backside 10g compute net'
ifconfig_cxgb3='inet 172.21.201.1 netmask 255.255.255.0 mtu 9000'
_c='physical backside 10g compute net'
ifconfig_cxgb6='inet 172.21.202.1 netmask 255.255.255.0 mtu 9000'
_c='physical backside 10g compute net'
ifconfig_cxgb8='inet 172.21.203.1 netmask 255.255.255.0 mtu 9000'
_c='physical backside 10g compute net'
ifconfig_cxgb4='inet 172.21.204.1 netmask 255.255.255.0 mtu 9000'
_c='physical backside 10g compute net'
ifconfig_cxgb0='inet 172.21.205.1 netmask 255.255.255.0 mtu 9000'
_c='physical backside 10g compute net'
The 10.24.6.34 and 10.24.6.33 are alias addresses for the system.
Destination Gateway Flags Refs Use Netif Expire
default 10.24.0.1 UGS 0 1049 bce0
The server works correctly (and quite well) for both udp tcp mounts.
Basically, all nfs traffic is great!
However, locking only works for clients connected to the 10.24.6.38
interface.
A tcpdump file from good bad runs:
http://www.freebsd.org/~jwd/lockgood.pcap
http://www.freebsd.org/~jwd/lockbad.pcap
Basically, the clients (both FreeBSD Linux) query the servers
rpcbind
for the address of the nlm which is returned correctly. For the good
run, the
NLM is then called. For the bad call, it is not.
Well, first off I think your packet traces are missing packets. If
you look at nlm_get_rpc(), which is the function in sys/nlm/nlm_prot_impl.c
that is doing this, you will see that it first attempts UDP and then falls
back to TCP when talking to rpcbind. Your packet traces only show TCP, so
I suspect that the UDP case went through a different interface (or missed
getting captured some other way?).
My guess would be that the attempt to connect to the server's NLM does the
same thing, since the lockbad.pcap doesn't show any SYN,... to port 844.
If I were you, I'd put lottsa printfs in nlm_get_rpc() showing what is in
the address structure ss and, in particular when it calls
clnt_reconnect_create().
{ For the client. }
For the server, it starts at sys_nlm_syscall(), which calls ... until
you get to nlm_register_services(). It copies in a list of address(es) and
I would printf those address(es) once copied into the kernel, to see if they
make sense. These are the address(es) that are going to get sobind()'d later
by a function called svn_tli_create() { over is sys/rpc/rpc_generic.c }.
That's as far as I got. Good luck with it, rick
I've started digging through code, but I do not claim to be an rpc
expert.
If anyone has suggestions I would appreciate any pointers.
Thanks!
John
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to
freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org