Hi, On Wed, Aug 17, 2016 at 3:44 PM, Patrick Zwahlen <[email protected]> wrote:
> Dear list (sorry for the rather long e-mail), > > I'm looking for someone who has successfully implemented the "exportfs" RA > with NFSv4 over TCP (and is willing to share some information). > > The final goal is to present NFS datastores to ESXi over 2 "head" nodes. > Both nodes must be active in the sense that they both have an NFS server > running but they export different file systems (via exports and floating > IPAddr2). > > When moving an export to another node, we move the entire > "filesystem/export/ipaddr" stack but we keep the NFS server running (as it > might potentially be exporting some other file systems via other IPs). > > Both nodes are sharing disks (JBOD for physical and shared VMDKs for > testing). Disks are only accessed by a single "head" node at any given time > so a clustered file system is not required. > > To my knowledge, this setup has been best described by Florian Haas over > there: > https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_ > techguides/book_sleha_techguides.html > (except we're not using DRBD and LVM) > > Before going into more details, I mention that I have already read all > those posts and examples as well as many of the NFS related questions in > this list for the past year or so. > > http://wiki.linux-nfs.org/wiki/index.php/Nfsd4_server_recovery > http://wiki.linux-nfs.org/wiki/index.php/NFS_Recovery_and_Client_Migration > http://oss.clusterlabs.org/pipermail/pacemaker/2011-July/011000.html > https://access.redhat.com/solutions/42868 > > I'm forced to use TCP because of ESXi and I'm willing to use NFSv4 because > ESXi can use "session trunking" or some sort of "multipath" with version 4 > (not tested yet) > > The problem I see is what a lot of people have already mentioned: Failover > works nicely but failback takes a very long time. Many posts mention > putting /var/lib/nfs on a shared disk but this only makes sense when we > failover an entire NFS server (compared to just exports). Moreover, I don't > see any relevant information written to /var/lib/nfs when a single Linux > NFSv4 client is mounting a folder. > > NFSv4 LEASE and GRACE time have been reduced to 10 seconds. I'm using the > exportfs RA parameter "wait_for_leasetime_on_stop=true". > > From my investigation, the problem actually happens at the TCP level. > Let's describe the most basic scenario, ie a single filesystem moving from > node1 to node2 and back. > > I first start the NFS servers using a clone resource. Node1 then starts a > group that mounts a file system, adds it to the export list (exportfs RA) > and adds a floating IP. > > I then mount this folder from a Linux NFS client. > > When I "migrate" my group out of node1, everything correctly moves to > node2. IPAddr2:stop, then the exportfs "stop" action takes about 12 seconds > (10 seconds LEASE time plus the rest) and my file system gets unmounted. > During that time, I see the NFS client trying to talk to the floating IP > (on its node1 MAC address). Once everything has moved to node2, the client > sends TCP packets to the new MAC address and node2 replies with a TCP > RESET. At this point, the client restarts a NEW TCP session and it works > fine. > > However, on node 1, I can still see an ESTABLISHED TCP session between the > client and the floating IP on port 2049 (NFS), even though the IP is gone. > After a short time, the session moves to FIN_WAIT1 and stays there for a > while. > > When I then "unmigrate" my group to node1 I see the same behavior except > that node1 is *not* sending TCP RESETS because it still has a TCP session > with the client. I imagine that the sequence numbers do not match so node1 > simply doesn't reply at all. It then takes several minutes for the client > to give up and restart a new NFS session. > > Does anyone have an idea about how to handle this problem ? I have done > this with iSCSI where we can explicitly "kill" sessions but I don't think > NFS has something similar. I also don't see anything in the IPAddr2 RA that > would help in killing TCP sessions while removing a floating IP. > This is a known problem ... have a look into the portblock RA - it has the feature to send out TCP tickle ACKs to reset such hanging sessions. So you can configure a portblock resource that blocks the tcp port before starting the VIP and another portblock resource that unblocks the port afterwards and sends out that tickle ACKs. Regards, Andreas > > Next ideas would be to either tune the TCP stack in order to reduce the > FIN_WAIT1 state or to synchronize sessions between the nodes (using > conntrackd). That just seems an overkill. > > Thanks for any input! Patrick > > > ************************************************************ > ************************** > This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they > are addressed. If you have received this email in error please notify > the system manager. "[email protected]" Navixia SA > ************************************************************ > ************************** > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
