[ https://issues.apache.org/jira/browse/YARN-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202708#comment-16202708 ]
Daryn Sharp commented on YARN-7319: ----------------------------------- bq. java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop-slave-743067341-hqrbk I'm a bit confused. Why is the node resolving itself as "hadoop-slave-743067341-hqrbk"? I believe that's the hostname self-reported during registration. If this is truly an ip-only environment, presumably that means the junk hostname is only in that node's /etc/hosts, but not in /etc/hosts of the other nodes? I understand not having reverse dns. However not having forward dns but assigning a private hostname is a bit obtuse, might as well not let the host resolve itself if nobody else can resolve it... Did you try setting {{hadoop.security.token.service.use_ip=false}} per the javadocs on buildTokenService? That will get you past the exception while generating the container token. It's likely the client won't be able to locate the token though – ie. token will have a host, but if the env is ip-only, the client must use an ip to connect and won't be able to match the ip with the hostname in the token. > java.net.UnknownHostException when trying contact node by hostname > ------------------------------------------------------------------ > > Key: YARN-7319 > URL: https://issues.apache.org/jira/browse/YARN-7319 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Reporter: Evgeny Makarov > > I'm trying to setup Hadoop on Kubernetes cluster with following setup: > Hadoop master is k8s pod > Each hadoop slave is additional k8s pod > All communication is being processed on IP based manned. In HDFS I have > setting of dfs.namenode.datanode.registration.ip-hostname-check set to false > and all works fine, however same option missing for YARN manager. > Here part of hadoop-master log when trying to submit simple word-count job: > 2017-10-12 09:00:25,005 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > Error trying to assign container token and NM token to an allocated > container container_1507798393049_0001_01_000001 > java.lang.IllegalArgumentException: java.net.UnknownHostException: > hadoop-slave-743067341-hqrbk > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) > at > org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:258) > at > org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:220) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:454) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:269) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:988) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:971) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:964) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:789) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:776) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.UnknownHostException: hadoop-slave-743067341-hqrbk > ... 19 more > As can be seen, host hadoop-slave-743067341-hqrbk is unreachable. Adding > record to /ets/hosts of master will solve the problem, however its not an > option in Kubernetes environment. There is should be a way to resolve nodes > by IP address -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org