Hello,

I have recovered from this situation already, but I was curious to hear if
others have tested or experienced this issue as well:

If the AFS database server with the lowest IP address goes down or is
offline, but there are 2 other database servers available, then are clients
and the remaining servers supposed to be able to handle that situation
gracefully? We had an incident where this happened (in our case, the
database server was taken offline because the switch died), and then it
appeared that AFS access (simply "ls /afs/<cellname>/") and vos commands
were unresponsive.

AFS database servers:
lowest IP address pt/vl server: CentOS 5.11 32-bit OpenAFS 1.6.11
secondary and tertiary pt/vl server: CentOS 6.6 64-bit OpenAFS 1.6.11

Clients (at least these were affected, among others I am sure):
CentOS 6.6 64-bit OpenAFS 1.6.1
CentOS 6.7 64-bit OpenAFS 1.6.9

Clients are all configured with -dynroot -fakestat-all and they have
identical CellServDB files listing our database servers in order from
lowest to highest IP.

I apologize that I do not have much in the way of debugging output... I
didn't think to run rxdebug on the client or a trace of the "ls" process.
We were in "emergency mode" trying to get the switch replaced to bring
services online, but I was still surprised that AFS exhibited this trouble.
I will try to replicate this issue in a test cell in the near future...

So I am mainly wondering if this is expected - if OpenAFS depends on having
its lowest IP address server online all the time - or if it's likely that
we have a configuration issue in our cell. I setup our cell about 5 years
ago as a complete newbie to OpenAFS, and while I've gained a lot of
insights and experience since, I still don't understand all the nuances.

Thank you!
-- 
Jonathan Leung-Nilsson
Social Sciences Computing Services
University of California, Irvine

Reply via email to