On Mon, Jan 07, 2019 at 07:40:36PM +0000, Ximeng (Simon) Guan wrote:
> Hello,
> 
> After a power outage on Christmas Eve which forced two database servers and 
> all the network switches in one of our offices to re-boot, our laptop clients 
> in that office can no longer connect to one of the AFS servers hosted in the 
> same office.
> 
> I am leaning towards the possibility that it is a network problem instead of 
> an OpenAFS service problem because:
> 
>   1.  Remote offices can access the full AFS space, including those volumes 
> hosted on the re-booted servers.
>   2.  Between the servers there is no access problem. Nothing wrong with the 
> result of "bos status", "rxdebug" or "udebug". "fs checkservers" show that 
> all servers are running.
>   3.  On the problematic laptops "fs checkservers" show that "All servers are 
> running".
>   4.  On the problematic laptops "bos status afssrv1" returns a message:
> 
> "bos: failed to contact host's bosserver (communications failure (-1))."
> 
> But on the servers both in that office and in the remote offices, the same 
> command shows that all services are up:
> 
> "Instance ptserver, currently running normally.
> 
> Instance vlserver, currently running normally.
> 
> Instance buserver, currently running normally.
> 
> Instance upserver, currently running normally.
> 
> Instance backupusers, currently running normally.
> 
>     Auxiliary status is: run next at Tue Jan  8 04:00:00 2019.
> 
> Instance dafs, currently running normally.
> 
> Auxiliary status is: file server running."
> 
>   1.  On the problematic laptops "rxdebug afssrv1 -port 7000" returns 
> *normal* output, for example:
> 
> "Trying 10.12.8.33 (port 7000):
> 
> Free packets: 2073/6357, packet reclaims: 3, calls: 81, used FDs: 36
> 
> not waiting for packets.
> 
> 0 calls waiting for a thread
> 
> 125 threads are idle
> 
> 1 calls have waited for a thread
> 
> Connection from host 10.9.119.50, port 7001, Cuid ae06e5b3/70fe0104
> 
>   serial 12,  natMTU 1344, security index 0, client conn
> 
>     call 0: # 4, state dally, mode: receiving, flags: receive_done
> 
>     call 1: # 0, state not initialized
> 
>     call 2: # 0, state not initialized
> 
>     call 3: # 0, state not initialized
> 
> Connection from host 10.12.4.74, port 7001, Cuid ae06e5b3/70fe0114
> 
>   serial 21,  natMTU 1344, security index 0, client conn
> 
>     call 0: # 7, state dally, mode: receiving, flags: receive_done
> 
>     call 1: # 0, state not initialized
> 
>     call 2: # 0, state not initialized
> 
>     call 3: # 0, state not initialized
> 
> Done."
> 
> I do not administer the network. Can I have some advice on how to futher 
> debug the connection problem? Which udp port does the command "bos status" 
> use?

My instinct would be that there is some multihoming going on and that
http://docs.openafs.org/Reference/5/NetRestrict.html and/or
http://docs.openafs.org/Reference/5/NetInfo.html are not properly
configured.

-Ben
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to