On 11/30/2011 2:58 PM, Jeffrey Altman wrote: > Andrew: > > The block of code in rx.c that is at question is: > > if (call->lastSendData && idleDeadTime && (conn->idleDeadErr != 0) > && ((call->lastSendData + idleDeadTime) < now)) { > if (call->state == RX_STATE_ACTIVE) { > cerror = conn->idleDeadErr; > goto mtuout; > } > } > > when conn->idleDeadErr is set to RX_CALL_DEAD in afs/afs_conn.c: > > rx_SetServerConnIdleDeadErr(tc->id, RX_CALL_DEAD); > > Please take a look at 6128. > > Jeffrey Altman
This was added by d26f5e158cffa313d0f504e7ba3afc1743b5d1ef as part of the MTU size probes that were developed at UIUC and that made things much worse because no handling of RX_CALL_DEAD was added to the Unix CM in afs_Analyze(). However, that is not the core problem with idle dead time processing. The concept is flawed. In the case where a file server is unable to process a call because either all of its threads are in use OR because the RPC in question is blocked from processing while another RPC completes keep alives will be sent to the client. If the client times out the call and retries due to RX_CALL_TIMEOUT all it is doing is placing itself at the back of the queue on the file server and potentially taking up another file server thread. In other words, it makes a sad file server even more unhappy. The implementation is broken in a number of ways: 1. it applies equally to operations against both replicated and non-replicated objects. Only replicated objects should have RX_CALL_TIMEOUT trigger at all. 2. the idle timeout is shorter than the hard dead timeout which is the timeout the file server uses when breaking callbacks. Therefore idle timeout processing triggers when the file server is enforcing cache coherency. This will occur on a r/w volume is which case the retry is going to hit the same file server that we just timed out. There is no gain here just additional overhead for the file server and delays imposed on the client. 3. idle dead time violates cache coherency. I didn't include this in my original post but it does. Three clients A, B and C have callback registrations for vnode 9999.24232.78382. Client A issues an RPC "CreateFile Foo" on Vnode 9999.24232.78382 to the file server. The file server is processing the request breaks callbacks to Client B which respond immediately and Client C which does not respond. Client B issues an RPC "CreateFile Bar" on vnode 9999.24232.78382 to the file server. This RPC blocks while waiting for the vnode lock. The file server waits the hard dead timeout (2 minutes) for Client C to respond. In the meantime, Client B waits the idle dead timeout (1 minute), determines there is no alternative site, and retries the RPC which then blocks on the vnode lock at the file server. The file server times out the callback break and begins to process the first "CreateFile Bar" request. It completes successfully but the response with the current status info cannot be returned to the client because the call is dead. The file server then processes the second "CreateFile Bar" request which fails with EEXIST. Client B is now left with an error it should not have gotten and no callback. The error is returned to the application which may or may not be well behaved when presented with an unexpected error. It is true that the cache manager can work around this by performing an additional FetchStatus RPC, notice the data version has changed out from underneath it, and refetch the directory contents, notice the file is in fact now there and return success to the application. However, that is a lot of work to do in the name of noticing when a file server has gotten stuck. As you indicated, this block: /* see if we have a non-activity timeout */ if (call->startWait && idleDeadTime && ((call->startWait + idleDeadTime) < now) && (call->flags & RX_CALL_READER_WAIT)) { if (call->state == RX_STATE_ACTIVE) { cerror = RX_CALL_TIMEOUT; goto mtuout; } } triggers for clients when they are waiting to read the result of an RPC. As Simon will explain in more detail in a separate e-mail, this block will fire if all data associated with the RPC has been transmitted and the call has been turned around. The RPC could be blocked on a vnode lock or simply waiting for a thread to be scheduled at the file server. It could be a FetchStatus or a small StoreData; perhaps a StoreData that represents a truncation. Client A issues a StoreData with a file size in it. The call blocks waiting for a thread and the client timesout the call. Client B issues a StoreData with data. Client A retries the StoreData with the same old file size. The file server truncates the file, stores B's data and then erases B's data with the repeated truncation. I can come up with additional scenarios. The idle data processing is fundamentally flawed and needs to be backed out. Jeffrey Altman
signature.asc
Description: OpenPGP digital signature