On 9/24/07, Will Maier <[EMAIL PROTECTED]> wrote:
>
> Hi, all-
>
> We've been having very acute and chronic periods during which one of
> our main fileservers shows large numbers of blocked connections.
> These periods do not (it seems) correlate with high system load,
> high network interface utilization, dropped packets, UDP errors,
> high I/O or other badness indicators that I'm accustomed to looking
> for.
>
> rxdebug shows up to 200-300 blocked connections during these
> periods, which last up to an hour or so after which the badness
> abates. Since this server hosts several critical volumes, including
> one in which many $PATH elements live, users notice these
> disruptions very quickly.
>
> We've tried our best to balance accesses between our three main
> servers and have moved several very active volumes off the
> misbehaving server. After the move, the server handles ~1 million
> volume accesses in an hour; our busiest server (which does not
> experience this problem) handles nearly three times as many
> accesses. rxdebug usually shows ~8 thousand active server and client
> connections on this server.
>
> No events in the FileLog correspond with the blocked connections. I
> do see regular ProbeUuid failures, but those are benign (right?).
>
> This server has a dual-core 3.00GHz Xeon CPU, 4GB RAM and a 1Gbps
> network connection. Its vice partitions are stored on a
> fibre-attached Xserve RAID array.
>
> What other information would help resolve this problem? Is there
> another aspect of the system that I should examine? What further
> steps might we take to try to resolve the issue?


A backtrace might help, but at first brush, the patch in OpenAFS RT ticket
19461 is probably what you want.

Reply via email to