use gdb's generate-core-file, or gcore, or pstack if you have it, and get a
backtrace.

On 7/27/07, Matthew Cocker <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> We are running about 20 redhat AS3 based openafs 1.4.2 fileservers. For
> the last three days between 4pm-10pm we have been getting 4-6 fileserver
> stop serving files with nagios monitoring warning of > 200 blocked
> connections. I have turned on debug for the fileserver prcoess and have a
> log file but nothing seemed bad to me (not that I would know). The servers
> are basically idle during these distruptions with CPU or disk showing very
> low usage but we have to be restarted to get access to files back.
>
> We added the -L flag to the fileserver process today to see if this helps
> but we are wondering if we can do anything else to find the cause and/or
> prevent these disruptions.
>
> We have checked and there are no admin scripts running at these times.
>
>
> BTW It would not be so bad if the client would fail over to other readonly
> volumes but it does not seem to. The fileservers effected seem to have the
> user root readonly volume on them but when the servers go into this state
> all client that have this server as the highest in the prioirity list just
> lock up and need to be restarted. Also despite having 10 readonly volumes to
> pcik form the clients tend to hit only a couple.
>
>
> Cheers
>
> Matt
>

Reply via email to