Glad to hear you got things figured out! -Ben
On Wed, Jan 09, 2019 at 02:26:19AM +0000, Ximeng (Simon) Guan wrote: > Thanks. Yes, the bos invocation did hang for a minimute or two before > repoting that failure. > > We just figured out the reason for the failure. It is still MTU-related: > > 1. Between offices we use IPsec for VPN and that limits the path MTU to be > 1400. > 2. To accommodate the reduced MTU we did the following: > 2.1 Apply -rxmaxmtu 1400 in BosConfig > 2.2 Adjust the ifcfg-xxx config in the host machine of the failed > database server to be 1400. > > It turns out that it is 2.2 that caused the problem. The database machine is > hosted as a KVM VM. When we adjusted the MTU of the ifcfg in the host to 1400 > and the power outage caused the server to reboot, the server started to drop > incoming 1500 UDP packets. > > The server and office laptops are connected through a L2 switch that does not > handle fragmentation. All remote traffic goes through a L3 router which does, > and re-packs them to 1400. That's why all the local clients had problem > accessing AFS but the remote servers and clients did not... > > Thank you! > > Simon > > -----Original Message----- > From: Benjamin Kaduk <[email protected]> > Sent: Tuesday, January 8, 2019 6:13 PM > To: Ximeng (Simon) Guan <[email protected]> > Cc: [email protected] > Subject: Re: [OpenAFS] Client connection failure: bos failed to contact > host's bosserver (communication failure (-1)) > > On Mon, Jan 07, 2019 at 08:00:27PM +0000, Ximeng (Simon) Guan wrote: > > We do have NetInfo properly set up to include the only one IP that is used. > > Good to know, thanks. > > I couldn't rule out MTU issues offhand, but don't have time to dig in further > right now. > > Do the problematic bos invocations hang for a minute or two before reporting > the "communications failure"? > > The bosserver listens on port 7007, if you hadn't found that already -- a > packet capture would help show what's going on, if you have the ability to > get one of those. > > -Ben > > > Can the connection failure somehow come from the non-default MTU settings > > we are using? That thing constantly bit us in the past in different places. > > We have "-rxmaxmtu 1344" used across the board for all ptservers, > > vlserver, davolserver and dafileserver instances. I was told by the network > > folks that they could not manage default MTU of 1500 but has to use 1400 > > because of the IPSec requirement... > > > > Thank you! > > Simon > > > > -----Original Message----- > > From: [email protected] <[email protected]> > > On Behalf Of Benjamin Kaduk > > Sent: Monday, January 7, 2019 11:44 AM > > To: Ximeng (Simon) Guan <[email protected]> > > Cc: [email protected] > > Subject: Re: [OpenAFS] Client connection failure: bos failed to > > contact host's bosserver (communication failure (-1)) > > > > On Mon, Jan 07, 2019 at 07:40:36PM +0000, Ximeng (Simon) Guan wrote: > > > Hello, > > > > > > After a power outage on Christmas Eve which forced two database servers > > > and all the network switches in one of our offices to re-boot, our laptop > > > clients in that office can no longer connect to one of the AFS servers > > > hosted in the same office. > > > > > > I am leaning towards the possibility that it is a network problem instead > > > of an OpenAFS service problem because: > > > > > > 1. Remote offices can access the full AFS space, including those > > > volumes hosted on the re-booted servers. > > > 2. Between the servers there is no access problem. Nothing wrong with > > > the result of "bos status", "rxdebug" or "udebug". "fs checkservers" show > > > that all servers are running. > > > 3. On the problematic laptops "fs checkservers" show that "All servers > > > are running". > > > 4. On the problematic laptops "bos status afssrv1" returns a message: > > > > > > "bos: failed to contact host's bosserver (communications failure (-1))." > > > > > > But on the servers both in that office and in the remote offices, the > > > same command shows that all services are up: > > > > > > "Instance ptserver, currently running normally. > > > > > > Instance vlserver, currently running normally. > > > > > > Instance buserver, currently running normally. > > > > > > Instance upserver, currently running normally. > > > > > > Instance backupusers, currently running normally. > > > > > > Auxiliary status is: run next at Tue Jan 8 04:00:00 2019. > > > > > > Instance dafs, currently running normally. > > > > > > Auxiliary status is: file server running." > > > > > > 1. On the problematic laptops "rxdebug afssrv1 -port 7000" returns > > > *normal* output, for example: > > > > > > "Trying 10.12.8.33 (port 7000): > > > > > > Free packets: 2073/6357, packet reclaims: 3, calls: 81, used FDs: 36 > > > > > > not waiting for packets. > > > > > > 0 calls waiting for a thread > > > > > > 125 threads are idle > > > > > > 1 calls have waited for a thread > > > > > > Connection from host 10.9.119.50, port 7001, Cuid ae06e5b3/70fe0104 > > > > > > serial 12, natMTU 1344, security index 0, client conn > > > > > > call 0: # 4, state dally, mode: receiving, flags: receive_done > > > > > > call 1: # 0, state not initialized > > > > > > call 2: # 0, state not initialized > > > > > > call 3: # 0, state not initialized > > > > > > Connection from host 10.12.4.74, port 7001, Cuid ae06e5b3/70fe0114 > > > > > > serial 21, natMTU 1344, security index 0, client conn > > > > > > call 0: # 7, state dally, mode: receiving, flags: receive_done > > > > > > call 1: # 0, state not initialized > > > > > > call 2: # 0, state not initialized > > > > > > call 3: # 0, state not initialized > > > > > > Done." > > > > > > I do not administer the network. Can I have some advice on how to futher > > > debug the connection problem? Which udp port does the command "bos > > > status" use? > > > > My instinct would be that there is some multihoming going on and that > > http://docs.openafs.org/Reference/5/NetRestrict.html and/or > > http://docs.openafs.org/Reference/5/NetInfo.html are not properly > > configured. > > > > -Ben > > _______________________________________________ > > OpenAFS-info mailing list > > [email protected] > > https://lists.openafs.org/mailman/listinfo/openafs-info > > _______________________________________________ > > OpenAFS-info mailing list > > [email protected] > > https://lists.openafs.org/mailman/listinfo/openafs-info _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
