> Last night we started seeing blocked connections above 50 across four of > our AFS servers; one that holds read-write volumes with one replica and > three pure replica servesr.
The symptoms sound really similar to the asymmetric client lossage. That bug is fixed in the mainline of OpenAFS (and the cells at MIT have been running with this patch for a while now), but it doesn't look like it was pulled up to the 1.2.x branch. The problem comes up when some client is able to send packets to the server, but the server is unable to send packets back to the client (because of a firewall, or some other misconfiguration). This ties up server's worker threads for a long time as the server tries to contact the client. If the client sends new requests sufficiently often (e.g. the Windows AFS client, whose timeouts are much lower than those of the UNIX client), the server runs out of worker threads. If you're interested, the deltas on the mainline for this bugfix are: rx-protect-servers-from-half-reachable-clients-20020119 rx-cleanup-deadlock-and-refcnt-leak-20020121 better-protection-against-asymmetric-clients-20020222 minor-rx-lock-cleanup-20020330 clear-attachwait-flag-20020403 -- kolya _______________________________________________ OpenAFS-devel mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-devel
