Varun Saxena created YARN-4321:
----------------------------------

             Summary: Incessant retries if NoAuthException is thrown by 
Zookeeper in non HA mode
                 Key: YARN-4321
                 URL: https://issues.apache.org/jira/browse/YARN-4321
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.7.1
            Reporter: Varun Saxena
            Assignee: Varun Saxena


This applies to only branch-2.7 or earlier code.
When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of 
YARN-4127), RM incessantly keeps on retrying the ZK operation.

{noformat}
2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree 
(DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : error: 
-102
2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] 
zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply 
sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null 
finished:false header:: 7591,1  replyHeader:: 7591,7610,-102  request:: 
'/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0  response::
2015-10-23 09:22:10,210 INFO  [ProcessThread(sid:0 cport:-1):] 
server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got 
user-level KeeperException when processing sessionid:0x15092d1ebe10001 
type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null 
Error:KeeperErrorCode = NoAuth
{noformat}

This is because we do not handle NoAuthException properly in branch-2.7 code 
when HA is not enabled.
In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen if 
HA is not enabled, we neither rethrow NoAuthException nor do we have any logic 
to increment retries and back out if retries are maxed out.
{code}
 T runWithRetries() throws Exception {
      int retry = 0;
      while (true) {
        try {
          return runWithCheck();
        } catch (KeeperException.NoAuthException nae) {
          if (HAUtil.isHAEnabled(getConfig())) {
            // NoAuthException possibly means that this store is fenced due to
            // another RM becoming active. Even if not,
            // it is safer to assume we have been fenced
            throw new StoreFencedException();
          }
        } catch (KeeperException ke) {
          .............
       }
     }
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to