[ 
https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983029#comment-14983029
 ] 

Varun Saxena commented on YARN-4321:
------------------------------------

Straightforward fix.
I think we dont need to retry for NoAuthException as the exception is unlikely 
to change even after retries.

> Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode
> --------------------------------------------------------------------------
>
>                 Key: YARN-4321
>                 URL: https://issues.apache.org/jira/browse/YARN-4321
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>         Attachments: YARN-4321-branch-2.7.01.patch
>
>
> This applies to only branch-2.7 or earlier code.
> When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of 
> YARN-4127), RM incessantly keeps on retrying the ZK operation.
> {noformat}
> 2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree 
> (DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : 
> error: -102
> 2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] 
> zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply 
> sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null 
> finished:false header:: 7591,1  replyHeader:: 7591,7610,-102  request:: 
> '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0  response::
> 2015-10-23 09:22:10,210 INFO  [ProcessThread(sid:0 cport:-1):] 
> server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got 
> user-level KeeperException when processing sessionid:0x15092d1ebe10001 
> type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null 
> Error:KeeperErrorCode = NoAuth
> {noformat}
> This is because we do not handle NoAuthException properly in branch-2.7 code 
> when HA is not enabled.
> In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen 
> if HA is not enabled, we neither rethrow NoAuthException nor do we have any 
> logic to increment retries and back out if retries are maxed out.
> {code}
>  T runWithRetries() throws Exception {
>       int retry = 0;
>       while (true) {
>         try {
>           return runWithCheck();
>         } catch (KeeperException.NoAuthException nae) {
>           if (HAUtil.isHAEnabled(getConfig())) {
>             // NoAuthException possibly means that this store is fenced due to
>             // another RM becoming active. Even if not,
>             // it is safer to assume we have been fenced
>             throw new StoreFencedException();
>           }
>         } catch (KeeperException ke) {
>           .............
>        }
>      }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to