Hello,
After a power outage on Christmas Eve which forced two database servers and all
the network switches in one of our offices to re-boot, our laptop clients in
that office can no longer connect to one of the AFS servers hosted in the same
office.
I am leaning towards the possibility that it is a network problem instead of an
OpenAFS service problem because:
1. Remote offices can access the full AFS space, including those volumes
hosted on the re-booted servers.
2. Between the servers there is no access problem. Nothing wrong with the
result of "bos status", "rxdebug" or "udebug". "fs checkservers" show that all
servers are running.
3. On the problematic laptops "fs checkservers" show that "All servers are
running".
4. On the problematic laptops "bos status afssrv1" returns a message:
"bos: failed to contact host's bosserver (communications failure (-1))."
But on the servers both in that office and in the remote offices, the same
command shows that all services are up:
"Instance ptserver, currently running normally.
Instance vlserver, currently running normally.
Instance buserver, currently running normally.
Instance upserver, currently running normally.
Instance backupusers, currently running normally.
Auxiliary status is: run next at Tue Jan 8 04:00:00 2019.
Instance dafs, currently running normally.
Auxiliary status is: file server running."
1. On the problematic laptops "rxdebug afssrv1 -port 7000" returns *normal*
output, for example:
"Trying 10.12.8.33 (port 7000):
Free packets: 2073/6357, packet reclaims: 3, calls: 81, used FDs: 36
not waiting for packets.
0 calls waiting for a thread
125 threads are idle
1 calls have waited for a thread
Connection from host 10.9.119.50, port 7001, Cuid ae06e5b3/70fe0104
serial 12, natMTU 1344, security index 0, client conn
call 0: # 4, state dally, mode: receiving, flags: receive_done
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
Connection from host 10.12.4.74, port 7001, Cuid ae06e5b3/70fe0114
serial 21, natMTU 1344, security index 0, client conn
call 0: # 7, state dally, mode: receiving, flags: receive_done
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
Done."
I do not administer the network. Can I have some advice on how to futher debug
the connection problem? Which udp port does the command "bos status" use?
Thank you!
Best regards,
========================================
Ximeng (Simon) Guan, Ph.D.
Associate Principal Engineer
Royole Corporation
48025 Fremont Blvd, Fremont, CA 94538
========================================