On Mon, Jan 07, 2019 at 07:40:36PM +0000, Ximeng (Simon) Guan wrote: > Hello, > > After a power outage on Christmas Eve which forced two database servers and > all the network switches in one of our offices to re-boot, our laptop clients > in that office can no longer connect to one of the AFS servers hosted in the > same office. > > I am leaning towards the possibility that it is a network problem instead of > an OpenAFS service problem because: > > 1. Remote offices can access the full AFS space, including those volumes > hosted on the re-booted servers. > 2. Between the servers there is no access problem. Nothing wrong with the > result of "bos status", "rxdebug" or "udebug". "fs checkservers" show that > all servers are running. > 3. On the problematic laptops "fs checkservers" show that "All servers are > running". > 4. On the problematic laptops "bos status afssrv1" returns a message: > > "bos: failed to contact host's bosserver (communications failure (-1))." > > But on the servers both in that office and in the remote offices, the same > command shows that all services are up: > > "Instance ptserver, currently running normally. > > Instance vlserver, currently running normally. > > Instance buserver, currently running normally. > > Instance upserver, currently running normally. > > Instance backupusers, currently running normally. > > Auxiliary status is: run next at Tue Jan 8 04:00:00 2019. > > Instance dafs, currently running normally. > > Auxiliary status is: file server running." > > 1. On the problematic laptops "rxdebug afssrv1 -port 7000" returns > *normal* output, for example: > > "Trying 10.12.8.33 (port 7000): > > Free packets: 2073/6357, packet reclaims: 3, calls: 81, used FDs: 36 > > not waiting for packets. > > 0 calls waiting for a thread > > 125 threads are idle > > 1 calls have waited for a thread > > Connection from host 10.9.119.50, port 7001, Cuid ae06e5b3/70fe0104 > > serial 12, natMTU 1344, security index 0, client conn > > call 0: # 4, state dally, mode: receiving, flags: receive_done > > call 1: # 0, state not initialized > > call 2: # 0, state not initialized > > call 3: # 0, state not initialized > > Connection from host 10.12.4.74, port 7001, Cuid ae06e5b3/70fe0114 > > serial 21, natMTU 1344, security index 0, client conn > > call 0: # 7, state dally, mode: receiving, flags: receive_done > > call 1: # 0, state not initialized > > call 2: # 0, state not initialized > > call 3: # 0, state not initialized > > Done." > > I do not administer the network. Can I have some advice on how to futher > debug the connection problem? Which udp port does the command "bos status" > use?
My instinct would be that there is some multihoming going on and that http://docs.openafs.org/Reference/5/NetRestrict.html and/or http://docs.openafs.org/Reference/5/NetInfo.html are not properly configured. -Ben _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
