On Mon, 31 Jan 2011, Steve Simmons wrote:
We have seen similar issues. It occurs when there is a given vice
partition where lots of clients have registered callbacks but those
clients are no longer accessible. Not all the clients have responded when
the 1800 second timer goes off, and the fileserver goes down uncleanly.
We have about 235,000 volumes spread across 40 vice partitions. Our 'fix'
is a combination of lengthening that timeout to a 3600 seconds and
keeping our vice partitions no longer than 2TB. Active partitions are
spread roughly equally across those 40 partitions. But that's just a
stopgap; the longer a server stays up, the more likely it accumulates
dead callbacks.
Assuming this is true, isn't this a good argument to keep the weekly server
process restarts?
Cheers,
Stephen
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info