use gdb's generate-core-file, or gcore, or pstack if you have it, and get a backtrace.
On 7/27/07, Matthew Cocker <[EMAIL PROTECTED]> wrote: > > Hi > > We are running about 20 redhat AS3 based openafs 1.4.2 fileservers. For > the last three days between 4pm-10pm we have been getting 4-6 fileserver > stop serving files with nagios monitoring warning of > 200 blocked > connections. I have turned on debug for the fileserver prcoess and have a > log file but nothing seemed bad to me (not that I would know). The servers > are basically idle during these distruptions with CPU or disk showing very > low usage but we have to be restarted to get access to files back. > > We added the -L flag to the fileserver process today to see if this helps > but we are wondering if we can do anything else to find the cause and/or > prevent these disruptions. > > We have checked and there are no admin scripts running at these times. > > > BTW It would not be so bad if the client would fail over to other readonly > volumes but it does not seem to. The fileservers effected seem to have the > user root readonly volume on them but when the servers go into this state > all client that have this server as the highest in the prioirity list just > lock up and need to be restarted. Also despite having 10 readonly volumes to > pcik form the clients tend to hit only a couple. > > > Cheers > > Matt >
