[ 
https://issues.apache.org/jira/browse/HADOOP-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741771#comment-13741771
 ] 

Sanjay Radia commented on HADOOP-9880:
--------------------------------------

We see exactly the same error during a test this morning.
The 2 Jiras that  caused this problem are the recent HADOOP-9421 and the 
earlier HDFS-3083.

HADOOP-9421 improved SASL protocol.
ZKFC uses Kerberos. But the server-side initiates the token-based challenge 
just in case the client wants token. As part of doing that the server does  
secretManager.checkAvailableForRead()  fails because the NN is in standby. 

It is really bizzare that there is check for the server's state (active or 
standby) as part of SASL. This was introduced in HDFS-3083 to deal with a 
failover bug. In HDFS-3083, Aaron noted that he does not like the solution: 
"I'm not in love with this solution, as it leaks abstractions all over the 
place,". The abstraction layer violation finally caught up with us. 

Turns out even prior to Dary's HADOOP-9421 a similar problem could have 
occurred if the ZKFC had used Kerberos for first connection and Tokens for any 
subsequent connections.

An immediate fix is required to fix what HADOOP-9421 broke but I believe we 
need to also fix the fix that HDFS-3083 introduced - the abstraction layer 
violations need to be cleaned up.
                
> RPC Server should not unconditionally create SaslServer with Token auth.
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-9880
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9880
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta
>            Reporter: Kihwal Lee
>            Priority: Blocker
>
> buildSaslNegotiateResponse() will create a SaslRpcServer with TOKEN auth. 
> When create() is called against it, secretManager.checkAvailableForRead() is 
> called, which fails in HA standby. Thus HA standby nodes cannot be 
> transitioned to active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to