Hi We are running about 20 redhat AS3 based openafs 1.4.2 fileservers. For the last three days between 4pm-10pm we have been getting 4-6 fileserver stop serving files with nagios monitoring warning of > 200 blocked connections. I have turned on debug for the fileserver prcoess and have a log file but nothing seemed bad to me (not that I would know). The servers are basically idle during these distruptions with CPU or disk showing very low usage but we have to be restarted to get access to files back.
We added the -L flag to the fileserver process today to see if this helps but we are wondering if we can do anything else to find the cause and/or prevent these disruptions. We have checked and there are no admin scripts running at these times. BTW It would not be so bad if the client would fail over to other readonly volumes but it does not seem to. The fileservers effected seem to have the user root readonly volume on them but when the servers go into this state all client that have this server as the highest in the prioirity list just lock up and need to be restarted. Also despite having 10 readonly volumes to pcik form the clients tend to hit only a couple. Cheers Matt
