[OpenAFS] Odd behavior during vos release

Kevin Hildebrand Wed, 09 Nov 2011 11:39:04 -0800

We've been having unusual slowness and hangs at times on some of ourfileservers, and I think I have a handle on the sequence of events, if notthe cause. I could use some assistance in filling in the gaps so I cansee if we can fix things.

Right now, I have a heavily used volume (by many clients) that is releasedon a frequent basis (as often as every ten minutes). This volume hasthree read-only replicas. The volume is about 200MB in size.

What I'm observing is that as soon as the vos release begins, one or moreof the readonly replicas start accumulating connections in the 'error'state. FileLog shows incoming FetchStatus RPCs to that replica are notbeing answered. If this condition occurs long enough, all of theseconnections eventually fill up the thread pool and the fileserver stopsserving data to everything else.

At some point, up to five minutes later, as the release proceeds, thereplica in question gets marked offline by the release process. At thistime, all of the stuck RPCs get 'FetchStatus returns 106' (VOFFLINE), atwhich point the connection pool clears, and life on the fileserver returnsto normal.

What I can't figure out is what's going on during the time the RPCs arehung, and why the connections show 'error'. (How does one determine whatthe error condition is, when viewing rxdebug output?)

Why would an RO replica be hung during a vos release?

Any clues on where to look next would be appreciated.

Thanks,
Kevin

--
Kevin Hildebrand
University of Maryland, College Park
Office of Information Technology
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

[OpenAFS] Odd behavior during vos release

Reply via email to