Sergey Shelukhin created YARN-4042: -------------------------------------- Summary: YARN registry should handle the absence of ZK node Key: YARN-4042 URL: https://issues.apache.org/jira/browse/YARN-4042 Project: Hadoop YARN Issue Type: Bug Reporter: Sergey Shelukhin
{noformat} 2015-08-10 11:33:46,931 WARN [LlapSchedulerNodeEnabler] rm.LlapTaskSchedulerService: Could not refresh list of active instances org.apache.hadoop.fs.PathNotFoundException: `/registry/users/huzheng/services/org-apache-hive/llap0/components/workers/worker-0000000025': No such file or directory: KeeperErrorCode = NoNode for /registry/users/huzheng/services/org-apache-hive/llap0/components/workers/worker-0000000025 at org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:377) at org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:360) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkRead(CuratorService.java:720) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.resolve(RegistryOperationsService.java:120) at org.apache.hadoop.registry.client.binding.RegistryUtils.extractServiceRecords(RegistryUtils.java:321) at org.apache.hadoop.registry.client.binding.RegistryUtils.listServiceRecords(RegistryUtils.java:177) at org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl$DynamicServiceInstanceSet.refresh(LlapYarnRegistryImpl.java:278) at org.apache.tez.dag.app.rm.LlapTaskSchedulerService.refreshInstances(LlapTaskSchedulerService.java:584) at org.apache.tez.dag.app.rm.LlapTaskSchedulerService.access$900(LlapTaskSchedulerService.java:79) at org.apache.tez.dag.app.rm.LlapTaskSchedulerService$NodeEnablerCallable.call(LlapTaskSchedulerService.java:887) at org.apache.tez.dag.app.rm.LlapTaskSchedulerService$NodeEnablerCallable.call(LlapTaskSchedulerService.java:855) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /registry/users/huzheng/services/org-apache-hive/llap0/components/workers/worker-0000000025 at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302) at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279) at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkRead(CuratorService.java:718) ... 12 more {noformat} ZK nodes can disappear after listing, for example ephemeral node can be cleaned up. YARN registry should handle that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)