[ https://issues.apache.org/jira/browse/YARN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983029#comment-14983029 ]
Varun Saxena commented on YARN-4321: ------------------------------------ Straightforward fix. I think we dont need to retry for NoAuthException as the exception is unlikely to change even after retries. > Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode > -------------------------------------------------------------------------- > > Key: YARN-4321 > URL: https://issues.apache.org/jira/browse/YARN-4321 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.7.1 > Reporter: Varun Saxena > Assignee: Varun Saxena > Attachments: YARN-4321-branch-2.7.01.patch > > > This applies to only branch-2.7 or earlier code. > When a {{NoAuthException}} is thrown in non HA mode(like in the scenario of > YARN-4127), RM incessantly keeps on retrying the ZK operation. > {noformat} > 2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree > (DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : > error: -102 > 2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] > zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply > sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null > finished:false header:: 7591,1 replyHeader:: 7591,7610,-102 request:: > '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0 response:: > 2015-10-23 09:22:10,210 INFO [ProcessThread(sid:0 cport:-1):] > server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got > user-level KeeperException when processing sessionid:0x15092d1ebe10001 > type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null > Error:KeeperErrorCode = NoAuth > {noformat} > This is because we do not handle NoAuthException properly in branch-2.7 code > when HA is not enabled. > In {{ZKRMStateStore#runWithRetries}}, we have code as under. As can be seen > if HA is not enabled, we neither rethrow NoAuthException nor do we have any > logic to increment retries and back out if retries are maxed out. > {code} > T runWithRetries() throws Exception { > int retry = 0; > while (true) { > try { > return runWithCheck(); > } catch (KeeperException.NoAuthException nae) { > if (HAUtil.isHAEnabled(getConfig())) { > // NoAuthException possibly means that this store is fenced due to > // another RM becoming active. Even if not, > // it is safer to assume we have been fenced > throw new StoreFencedException(); > } > } catch (KeeperException ke) { > ............. > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)