On Mon, 31 Jan 2011, Steve Simmons wrote:

We have seen similar issues. It occurs when there is a given vice partition where lots of clients have registered callbacks but those clients are no longer accessible. Not all the clients have responded when the 1800 second timer goes off, and the fileserver goes down uncleanly.

We have about 235,000 volumes spread across 40 vice partitions. Our 'fix' is a combination of lengthening that timeout to a 3600 seconds and keeping our vice partitions no longer than 2TB. Active partitions are spread roughly equally across those 40 partitions. But that's just a stopgap; the longer a server stays up, the more likely it accumulates dead callbacks.

Assuming this is true, isn't this a good argument to keep the weekly server process restarts?

Cheers,
Stephen
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to