Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11449 )
Change subject: KUDU-2580 [c++ client] authn token reacquisition fix ...................................................................... Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/11449/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11449/1//COMMIT_MSG@9 PS1, Line 9: Updated the authn token reacquisition logic to handle the following > Could you add more detail here to make it more obvious why this leads to a Done http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.h File src/kudu/client/client-internal.h: http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.h@165 PS1, Line 165: void ReconnectToCluster( > Doc this. Maybe move the definition of ReconnectionReason down here too, si Done http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.h@166 PS1, Line 166: const MonoTime& deadline, : KuduClient* client, > Nit: can you switch the argument order of these two, to reduce the delta be Done http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.cc@158 PS1, Line 158: Status KuduClient::Data::SyncLeaderMasterRpc( > So to repro the bug, which code branches down here are taken, in what order Having the connection to the former leader master still open, but authn token expired and master re-election selecting another master as leader, the GetTableSchema() RPC call hits the code of the last 'if ()' closure within the retry 'for ()' cycle, where the error code is MasterErrorPB::NOT_THE_LEADER. The ConnectToMaster() attempt would yield FATAL_INVALID_AUTHENTICATION_TOKEN error code, and the control flow will continue with the retries, but the state of the client stays the same (i.e. the leader master proxy hasn't been updated). That will happen again and again, until the RPC call of GetTableSchema() times out. As I mentioned in the prior changelist, I think it's worth refactoring the ConnectionToCluster code a bit and clean the cached master proxy in that case. But I think it's better to do that in a separate changelist, keeping this changelist targeted for back-porting to older maintenance branches. http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.cc@270 PS1, Line 270: if (resp->error().code() == MasterErrorPB::NOT_THE_LEADER || > BTW, in our lunch conversation I recall you saying that one problem is that The original approach of authn token re-acquisition is to go get reacquire authn token when the returned error code of the RPC call is FATAL_INVALID_AUTHENTICATION_TOKEN. That's straightforward and simple enough. However, that works only in case if authn token is present, i.e. that works only when establishing a new connection. Once connection is established, no authn token is sent to the server anymore. And even if it were sent, that might be some performance hit to verify authn token signature and check for expiration time at every RPC call. So, the former master leader simply cannot notify the client of its expired token since the master does not see the authn token during regular RPC calls. http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.cc@659 PS1, Line 659: s.IsNotAuthorized > I think here we might need to narrow down the case when we need to retry: p Will and I discussed this offline and it seems having this as-is for the purpose of back-porting this to older branches is OK. I'll send a follow-up patch for review with refactoring this code. -- To view, visit http://gerrit.cloudera.org:8080/11449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4477d0f2bb36ee5ef580b585cae189a634d5002f Gerrit-Change-Number: 11449 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Wed, 19 Sep 2018 01:23:18 +0000 Gerrit-HasComments: Yes
