Sergey Shelukhin created YARN-4042:
--------------------------------------

             Summary: YARN registry should handle the absence of ZK node
                 Key: YARN-4042
                 URL: https://issues.apache.org/jira/browse/YARN-4042
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Sergey Shelukhin


{noformat}
2015-08-10 11:33:46,931 WARN [LlapSchedulerNodeEnabler] 
rm.LlapTaskSchedulerService: Could not refresh list of active instances
org.apache.hadoop.fs.PathNotFoundException: 
`/registry/users/huzheng/services/org-apache-hive/llap0/components/workers/worker-0000000025':
 No such file or directory: KeeperErrorCode = NoNode for 
/registry/users/huzheng/services/org-apache-hive/llap0/components/workers/worker-0000000025
        at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:377)
        at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:360)
        at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkRead(CuratorService.java:720)
        at 
org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.resolve(RegistryOperationsService.java:120)
        at 
org.apache.hadoop.registry.client.binding.RegistryUtils.extractServiceRecords(RegistryUtils.java:321)
        at 
org.apache.hadoop.registry.client.binding.RegistryUtils.listServiceRecords(RegistryUtils.java:177)
        at 
org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl$DynamicServiceInstanceSet.refresh(LlapYarnRegistryImpl.java:278)
        at 
org.apache.tez.dag.app.rm.LlapTaskSchedulerService.refreshInstances(LlapTaskSchedulerService.java:584)
        at 
org.apache.tez.dag.app.rm.LlapTaskSchedulerService.access$900(LlapTaskSchedulerService.java:79)
        at 
org.apache.tez.dag.app.rm.LlapTaskSchedulerService$NodeEnablerCallable.call(LlapTaskSchedulerService.java:887)
        at 
org.apache.tez.dag.app.rm.LlapTaskSchedulerService$NodeEnablerCallable.call(LlapTaskSchedulerService.java:855)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for 
/registry/users/huzheng/services/org-apache-hive/llap0/components/workers/worker-0000000025
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279)
        at 
org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41)
        at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkRead(CuratorService.java:718)
        ... 12 more
{noformat}

ZK nodes can disappear after listing, for example ephemeral node can be cleaned 
up. YARN registry should handle that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to