[
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974013#comment-14974013
]
Steve Loughran commented on YARN-4251:
--------------------------------------
bq. Also the dismissive nature of the wiki:
"Finally, this is not a Hadoop problem, it is a host, network or Hadoop
configuration problem. As it is your cluster, only you can find out and track
down the problem.. Sorry"
bq. Everything worked fine one day. I upgrade hadoop it stops working. The wiki
ends with a bold claim that every bind exception that starts the day after
upgrade is not a hadoop problem.
Edward, I an assure you that most of the JIRAs we get related to:
ConnectionRefused, BindException, NoRouteToHostException,...etc are related to
system configs. it is almost invariably some machine config issue, be it ubuntu
mapping localhost to 127.0.1.1; a firewall in the way, rDNS broken, or tothers.
And we get so many complaining that the namenode is refusing connections, when
either the firewall is up, the port settings for the client are wrong, the
hostname is wrong or the NN isn't up. Same for BindException.
We've gone to the effort of adding wrappers around all socket exceptions to add
in hostnames and ports (the things people who understand networking need), and
wiki entries to help people fend for themselves and not file Critical issues
about problems that they generally have to fix for themselves. Yet even with
those exceptions saying "look at the wiki" entry, we still get people not
following the link, but going straight to JIRA: HADOOP-12391.
if you look at the history of those wiki entries, you can see that they
continually grow as we find new system setup issues which trigger the
exception. That's because I do hit problems, I do fix them myself, and
whenever I do that, I add another line. If you've found a new way, once fixed,
I encourage you add a new entry. And, at the same time, you are free to change
that text at the end.
> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is
> failing
> ---------------------------------------------------------------------------------
>
> Key: YARN-4251
> URL: https://issues.apache.org/jira/browse/YARN-4251
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Brahma Reddy Battula
> Assignee: Brahma Reddy Battula
> Attachments: YARN-4251.patch
>
>
> *Trace*
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> java.net.BindException: Problem binding to [0.0.0.0:9030]
> java.net.BindException: Address already in use: bind; For more details see:
> http://wiki.apache.org/hadoop/BindException
> at
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
> at
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
> at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Unknown Source)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
> at
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030]
> java.net.BindException: Address already in use: bind; For more details see:
> http://wiki.apache.org/hadoop/BindException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)
> at java.lang.reflect.Constructor.newInstance(Unknown Source)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
> at org.apache.hadoop.ipc.Server.bind(Server.java:486)
> at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:646)
> at org.apache.hadoop.ipc.Server.<init>(Server.java:2399)
> at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:946)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:537)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
> at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
> at
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
> at
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
> ... 27 more
> Caused by: java.net.BindException: Address already in use: bind
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Unknown Source)
> at sun.nio.ch.Net.bind(Unknown Source)
> at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
> at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source)
> at org.apache.hadoop.ipc.Server.bind(Server.java:469)
> ... 35 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)