On Jan 31, 2011, at 12:17 PM, Stephen Joyce wrote: > On Mon, 31 Jan 2011, Steve Simmons wrote: > >> We have seen similar issues. It occurs when there is a given vice partition >> where lots of clients have registered callbacks but those clients are no >> longer accessible. Not all the clients have responded when the 1800 second >> timer goes off, and the fileserver goes down uncleanly. >> >> We have about 235,000 volumes spread across 40 vice partitions. Our 'fix' is >> a combination of lengthening that timeout to a 3600 seconds and keeping our >> vice partitions no longer than 2TB. Active partitions are spread roughly >> equally across those 40 partitions. But that's just a stopgap; the longer a >> server stays up, the more likely it accumulates dead callbacks. > > Assuming this is true, isn't this a good argument to keep the weekly server > process restarts?
Weekly outages, even if only for a few minutes per, are not acceptable here. Doing them less frequently starts to put us into the range of the timeout problems above. At the moment most of our afs service processes have run happily for 237 days. That alone is a strong argument for not needing weekly restarts. If there are memory leaks, etc, they largely aren't affecting us since We mostly do restarts when we need to do software upgrades of one sort or another. They are typically done in a rolling fashion - upgrade the hot spare(s), vos move volumes to the hot spare(s), take down the vacated servers and upgrade, lather, rinse, repeat. At one point we went two years without a general AFS shutdown. We only got away from that due to bugs that required us to do OS upgrades more frequently or the entire cell at once. Life seems generally better with respect to those issues; and campus' opinion of the service is better when there are no perceived outages. For the curious, we're running 1.4.12 with a couple of fixes we pulled forward from the 1.4.13 development stream. Barring new developments, the next one we'll give serious consideration to is 1.6.X._______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
