[
https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986867#comment-14986867
]
Varun Saxena commented on YARN-4321:
------------------------------------
Thanks [~jianhe] for the review and commit
> Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
> --------------------------------------------------------------------------
>
> Key: YARN-4321
> URL: https://issues.apache.org/jira/browse/YARN-4321
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.7.1
> Reporter: Varun Saxena
> Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-4321-branch-2.7.01.patch
>
>
> This applies to only branch-2.7 or earlier code.
> When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of
> YARN-4127), RM incessantly keeps on retrying the ZK operation.
> {noformat}
> 2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree
> (DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 :
> error: -102
> 2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)]
> zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply
> sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null
> finished:false header:: 7591,1 replyHeader:: 7591,7610,-102 request::
> '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0 response::
> 2015-10-23 09:22:10,210 INFO [ProcessThread(sid:0 cport:-1):]
> server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got
> user-level KeeperException when processing sessionid:0x15092d1ebe10001
> type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null
> Error:KeeperErrorCode = NoAuth
> {noformat}
> This is because we do not handle NoAuthException properly in branch-2.7 code
> when HA is not enabled.
> In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen
> if HA is not enabled, we neither rethrow NoAuthException nor do we have any
> logic to increment retries and back out if retries are maxed out.
> {code}
> T runWithRetries() throws Exception {
> int retry = 0;
> while (true) {
> try {
> return runWithCheck();
> } catch (KeeperException.NoAuthException nae) {
> if (HAUtil.isHAEnabled(getConfig())) {
> // NoAuthException possibly means that this store is fenced due to
> // another RM becoming active. Even if not,
> // it is safer to assume we have been fenced
> throw new StoreFencedException();
> }
> } catch (KeeperException ke) {
> .............
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)