On Feb 24, 2011, at 4:44 PM, Andrew Deason <adea...@sinenomine.net> wrote:
> On Thu, 24 Feb 2011 08:40:13 +0100
> Derrick Brashear <sha...@dementia.org> wrote:
>
>>> Regardless of whatever bugs on the fileserver may be in play,
>>> clients should indeed issue a new query on VOFFLINE.
>>
>> bugs on the fileserver? how about 'none'?
>
> In this case, yeah, I assume so; but Jeff is correct that there have
> been bugs where VOFFLINE was reported when it should not have been. I'm
> just saying "even if that were not the case..."
>
>>> A VOFFLINE error can be the result of incorrect/stale volume
>>> location information (if a volume is offline on one server but
>>> online another), and so the current information should be looked up
>>> when it occurs.
>>
>> huge waste of RPCs for a legitimate operating condition albeit an
>> undesirable one. you'll create a vldb storm if a 'popular' volume goes
>> offline.
>
> "Huge". Unix clients have been doing it ~forever, and the number of
> places I have ever heard of even noticing a vlserver load I can probably
> count on one hand.
>
It came up again in the historical research I just did, so it's fresh on my
mind how poor RPC semantics can exacerbate it. Huge? As much as N clients
versus none, within the smallest (timeout-wise) callback bucket, is huge.
>>>> In a similar vein, if the file server is inaccessible, the client
>>>> does not issue a new VLDB query.
>>>
>>> ...this is intentional? Why doesn't it? We could be contacting the
>>> wrong server because we have stale location information.
>>
>> We could. But that's basically true of any error, and if we run to
>> mommy on every error, eventually mommy can't handle us being so pesky
>> and melts down
>
> Some are still a lot more likely to have "stale location" to be the
> cause than others. The probability of RX_CALL_DEAD being so I suppose is
> rather small, as it only happens in this "move and shutdown" scenario,
> and leaving the server on isn't too hard. Of course, such is not always
> under the control of the administrator, but eh.
>
We advertise 'leave it up 2 hours or suffer', or have previously and should
again. The horse is pointed at water. Drink already.
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info