On Mon, Jan 07, 2019 at 08:00:27PM +0000, Ximeng (Simon) Guan wrote: > We do have NetInfo properly set up to include the only one IP that is used.
Good to know, thanks. I couldn't rule out MTU issues offhand, but don't have time to dig in further right now. Do the problematic bos invocations hang for a minute or two before reporting the "communications failure"? The bosserver listens on port 7007, if you hadn't found that already -- a packet capture would help show what's going on, if you have the ability to get one of those. -Ben > Can the connection failure somehow come from the non-default MTU settings we > are using? That thing constantly bit us in the past in different places. We > have "-rxmaxmtu 1344" used across the board for all ptservers, vlserver, > davolserver and dafileserver instances. I was told by the network folks that > they could not manage default MTU of 1500 but has to use 1400 because of the > IPSec requirement... > > Thank you! > Simon > > -----Original Message----- > From: [email protected] <[email protected]> On > Behalf Of Benjamin Kaduk > Sent: Monday, January 7, 2019 11:44 AM > To: Ximeng (Simon) Guan <[email protected]> > Cc: [email protected] > Subject: Re: [OpenAFS] Client connection failure: bos failed to contact > host's bosserver (communication failure (-1)) > > On Mon, Jan 07, 2019 at 07:40:36PM +0000, Ximeng (Simon) Guan wrote: > > Hello, > > > > After a power outage on Christmas Eve which forced two database servers and > > all the network switches in one of our offices to re-boot, our laptop > > clients in that office can no longer connect to one of the AFS servers > > hosted in the same office. > > > > I am leaning towards the possibility that it is a network problem instead > > of an OpenAFS service problem because: > > > > 1. Remote offices can access the full AFS space, including those volumes > > hosted on the re-booted servers. > > 2. Between the servers there is no access problem. Nothing wrong with > > the result of "bos status", "rxdebug" or "udebug". "fs checkservers" show > > that all servers are running. > > 3. On the problematic laptops "fs checkservers" show that "All servers > > are running". > > 4. On the problematic laptops "bos status afssrv1" returns a message: > > > > "bos: failed to contact host's bosserver (communications failure (-1))." > > > > But on the servers both in that office and in the remote offices, the same > > command shows that all services are up: > > > > "Instance ptserver, currently running normally. > > > > Instance vlserver, currently running normally. > > > > Instance buserver, currently running normally. > > > > Instance upserver, currently running normally. > > > > Instance backupusers, currently running normally. > > > > Auxiliary status is: run next at Tue Jan 8 04:00:00 2019. > > > > Instance dafs, currently running normally. > > > > Auxiliary status is: file server running." > > > > 1. On the problematic laptops "rxdebug afssrv1 -port 7000" returns > > *normal* output, for example: > > > > "Trying 10.12.8.33 (port 7000): > > > > Free packets: 2073/6357, packet reclaims: 3, calls: 81, used FDs: 36 > > > > not waiting for packets. > > > > 0 calls waiting for a thread > > > > 125 threads are idle > > > > 1 calls have waited for a thread > > > > Connection from host 10.9.119.50, port 7001, Cuid ae06e5b3/70fe0104 > > > > serial 12, natMTU 1344, security index 0, client conn > > > > call 0: # 4, state dally, mode: receiving, flags: receive_done > > > > call 1: # 0, state not initialized > > > > call 2: # 0, state not initialized > > > > call 3: # 0, state not initialized > > > > Connection from host 10.12.4.74, port 7001, Cuid ae06e5b3/70fe0114 > > > > serial 21, natMTU 1344, security index 0, client conn > > > > call 0: # 7, state dally, mode: receiving, flags: receive_done > > > > call 1: # 0, state not initialized > > > > call 2: # 0, state not initialized > > > > call 3: # 0, state not initialized > > > > Done." > > > > I do not administer the network. Can I have some advice on how to futher > > debug the connection problem? Which udp port does the command "bos status" > > use? > > My instinct would be that there is some multihoming going on and that > http://docs.openafs.org/Reference/5/NetRestrict.html and/or > http://docs.openafs.org/Reference/5/NetInfo.html are not properly configured. > > -Ben > _______________________________________________ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info > _______________________________________________ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
