On Mon, Sep 23, 2013 at 3:52 PM, Mark Vitale <mvit...@sinenomine.net> wrote:
> Recently I've been working on several problems with very different > externals but a similar root cause: > > 1) While accessing a particular fileserver, AFS clients experience > performance delays; some also see multiple "server down/back up" problems. > - root cause was a hardware bug on the fileserver that prevented timers > from firing reliably; this unpredictably delayed any task in the rxevent > queue, while leaving the rest of the fileserver function relatively > unaffected. (btw, this was a pthreaded fileserver). > > 2) Volume releases suffer from poor performance and occasionally fail with > timeouts. > - root cause was heavier-than-normal vlserver load (perhaps caused by > disk performance slowdowns); this starved LWP IOMGR, which in turn > prevented LWP rx_Listener from being dispatched (priority inversion), > leading to a grossly delayed rxevent queue. > > So in two very different situations, the rxevent queue was unable to > process scheduled events in a timely manner, leading to very strange and > difficult-to-diagnose symptoms. > > I'm writing this note to begin a discussion on possible ways to address > this in OpenAFS. > > One possible approach is to implement some watchdog/sentinel code to > detect when the rxevent queue is not working correctly; that is, when it's > unable to run scheduled events in a timely manner. Certainly rxevent > can't watch itself; but rather than adding another thread as a watchdog, I > chose to insert a sanity check into rxevent_Post(). This check essentially > compares the current time (if supplied on the "now" parameter) with the > scheduled time for the top rxevent on the queue. If it's later than a > certain threshold, then we know that the rxevent queue has fallen behind > (is "sick") for some unknown reason. At this point, I set a state flag > which causes any new connections to abort (with timeout or busy, for > example). Both the threshold and reply could be configurable, similar to > the current implementation of the -busyat thread-busy threshold and > response. After the rxevent queue is able to catch up with its scheduling > work, the "sick" state is reset. And lastly, warning messages could be > written to the log to indicate that the rxevent queue is having > difficulties and later has returned to normal. I have some prototype code > working in my test environment; it needs some work before it will be > suitable for review. > > I haven't profiled rxevent queue handling in about 5 years. With your code does the queue appear "sick" on a normally functioning host if you assume that threshold is 0 (if now is later than the top scheduled event, assume it should have already fired)? -- Derrick