Re: [OpenAFS-devel] "Lost contact with file server" problems

Jeffrey Hutzelman Sun, 28 Aug 2005 13:37:14 -0700

On Monday, August 22, 2005 16:52:29 -0400 Jeffrey Altman<[EMAIL PROTECTED]> wrote:

I'm sure there is code in the client that identifies expired tokens
and removes them.   I just don't believe that code is associated in
any way with the code that processes RXKADEXPIRED errors.

Well, I don't know what strangeness you might have in the Windows client.The traditional client _does_ discard a user's tokens when it gets anyauthentication error, including RXKADEXPIRED.

I'm also suspicious of why the server has no code that specifically
addresses RXKADEXPIRED errors if the client is allowed to send them
to the server.

The client isn't specifically sending RXKADEXPIRED. It is sending an abortbecause it received a packet on a connection that is in error. Suchaborts, whether sent by the client or server, _always_ contain the errorcode corresponding to the current error on the call.

The server doesn't need to _do_ anything special in response to thisparticular error. It just needs to propagate the error back up the callchain, which it does, so that whatever procedure is handling this call getsan error on its next rx_Write or whatever and aborts. This is allperfectly normal.

Now, as Derrick noted, the RXKADEXPIRED is in fact not originating in theclient, but in the _server_; the connection is in error because an abort onthat connection was received two or three minutes earlier with an errorcode of RXKADEXPIRED.

The confusing thing is, once the connection is in error, why is the clientever sending a new request to the server? The answer appears to be thatrx_NewCall on a connection in error does not fail (not surprising; IIRC theassumption is that rx_NewCall always succeeds), but also does not propagatethe connection's error state down to the call. IMHO this is a bug.

If this is in fact the problem, I believe the patch below will make theclient notice the error condition on the newly-created call. There isstill some question as to why the client did not react to the RXKADEXPIREDreceived in response to its _previous_ call. Of course, there's _also_ thequestion as to why there was such a huge latency between the data packet onthat call and the resulting abort.


-- Jeff


Index: rx.c
===================================================================
RCS file: /cvs/openafs/src/rx/rx.c,v
retrieving revision 1.83
diff -u -r1.83 rx.c
--- rx.c        19 Aug 2005 19:20:44 -0000      1.83
+++ rx.c        26 Aug 2005 20:31:19 -0000
@@ -1146,7 +1146,12 @@

    /* Client is initially in send mode */
    call->state = RX_STATE_ACTIVE;
-    call->mode = RX_MODE_SENDING;
+    if (conn->error) {
+        call->mode = RX_MODE_ERROR;
+        call->error = conn->error;
+    } else {
+        call->mode = RX_MODE_SENDING;
+    }

    /* remember start time for call in case we have hard dead time limit */
    call->queueTime = queueTime;

_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: [OpenAFS-devel] "Lost contact with file server" problems

Reply via email to