Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11449 )

Change subject: KUDU-2580 [c++ client] authn token reacquisition fix
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11449/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11449/1//COMMIT_MSG@9
PS1, Line 9: Updated the authn token reacquisition logic to handle the following
> Could you add more detail here to make it more obvious why this leads to a
Done


http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.h
File src/kudu/client/client-internal.h:

http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.h@165
PS1, Line 165:   void ReconnectToCluster(
> Doc this. Maybe move the definition of ReconnectionReason down here too, si
Done


http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.h@166
PS1, Line 166:       const MonoTime& deadline,
             :       KuduClient* client,
> Nit: can you switch the argument order of these two, to reduce the delta be
Done


http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.cc
File src/kudu/client/client-internal.cc:

http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.cc@158
PS1, Line 158: Status KuduClient::Data::SyncLeaderMasterRpc(
> So to repro the bug, which code branches down here are taken, in what order
Having the connection to the former leader master still open, but authn token 
expired and master re-election selecting another master as leader, the 
GetTableSchema() RPC call hits the code of the last 'if ()' closure within the 
retry 'for ()' cycle, where the error code is MasterErrorPB::NOT_THE_LEADER.

The ConnectToMaster() attempt would yield FATAL_INVALID_AUTHENTICATION_TOKEN 
error code, and the control flow will continue with the retries, but the state 
of the client stays the same (i.e. the leader master proxy hasn't been 
updated).  That will happen again and again, until the RPC call of 
GetTableSchema() times out.

As I mentioned in the prior changelist, I think it's worth refactoring the 
ConnectionToCluster code a bit and clean the cached master proxy in that case.  
But I think it's better to do that in a separate changelist, keeping this 
changelist targeted for back-porting to older maintenance branches.


http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.cc@270
PS1, Line 270:       if (resp->error().code() == MasterErrorPB::NOT_THE_LEADER 
||
> BTW, in our lunch conversation I recall you saying that one problem is that
The original approach of authn token re-acquisition is to go get reacquire 
authn token when the returned error code of the RPC call is 
FATAL_INVALID_AUTHENTICATION_TOKEN.  That's straightforward and simple enough.  
However, that works only in case if authn token is present, i.e. that works 
only when establishing a new connection.  Once connection is established, no 
authn token is sent to the server anymore.  And even if it were sent, that 
might be some performance hit to verify authn token signature and check for 
expiration time at every RPC call.

So, the former master leader simply cannot notify the client of its expired 
token since the master does not see the authn token during regular RPC calls.


http://gerrit.cloudera.org:8080/#/c/11449/1/src/kudu/client/client-internal.cc@659
PS1, Line 659: s.IsNotAuthorized
> I think here we might need to narrow down the case when we need to retry: p
Will and I discussed this offline and it seems having this as-is for the 
purpose of back-porting this to older branches is OK.

I'll send a follow-up patch for review with refactoring this code.



--
To view, visit http://gerrit.cloudera.org:8080/11449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4477d0f2bb36ee5ef580b585cae189a634d5002f
Gerrit-Change-Number: 11449
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Wed, 19 Sep 2018 01:23:18 +0000
Gerrit-HasComments: Yes

Reply via email to