On 9/24/07, Will Maier <[EMAIL PROTECTED]> wrote: > > Hi, all- > > We've been having very acute and chronic periods during which one of > our main fileservers shows large numbers of blocked connections. > These periods do not (it seems) correlate with high system load, > high network interface utilization, dropped packets, UDP errors, > high I/O or other badness indicators that I'm accustomed to looking > for. > > rxdebug shows up to 200-300 blocked connections during these > periods, which last up to an hour or so after which the badness > abates. Since this server hosts several critical volumes, including > one in which many $PATH elements live, users notice these > disruptions very quickly. > > We've tried our best to balance accesses between our three main > servers and have moved several very active volumes off the > misbehaving server. After the move, the server handles ~1 million > volume accesses in an hour; our busiest server (which does not > experience this problem) handles nearly three times as many > accesses. rxdebug usually shows ~8 thousand active server and client > connections on this server. > > No events in the FileLog correspond with the blocked connections. I > do see regular ProbeUuid failures, but those are benign (right?). > > This server has a dual-core 3.00GHz Xeon CPU, 4GB RAM and a 1Gbps > network connection. Its vice partitions are stored on a > fibre-attached Xserve RAID array. > > What other information would help resolve this problem? Is there > another aspect of the system that I should examine? What further > steps might we take to try to resolve the issue?
A backtrace might help, but at first brush, the patch in OpenAFS RT ticket 19461 is probably what you want.
