[ https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974013#comment-14974013 ]
Steve Loughran commented on YARN-4251: -------------------------------------- bq. Also the dismissive nature of the wiki: "Finally, this is not a Hadoop problem, it is a host, network or Hadoop configuration problem. As it is your cluster, only you can find out and track down the problem.. Sorry" bq. Everything worked fine one day. I upgrade hadoop it stops working. The wiki ends with a bold claim that every bind exception that starts the day after upgrade is not a hadoop problem. Edward, I an assure you that most of the JIRAs we get related to: ConnectionRefused, BindException, NoRouteToHostException,...etc are related to system configs. it is almost invariably some machine config issue, be it ubuntu mapping localhost to 127.0.1.1; a firewall in the way, rDNS broken, or tothers. And we get so many complaining that the namenode is refusing connections, when either the firewall is up, the port settings for the client are wrong, the hostname is wrong or the NN isn't up. Same for BindException. We've gone to the effort of adding wrappers around all socket exceptions to add in hostnames and ports (the things people who understand networking need), and wiki entries to help people fend for themselves and not file Critical issues about problems that they generally have to fix for themselves. Yet even with those exceptions saying "look at the wiki" entry, we still get people not following the link, but going straight to JIRA: HADOOP-12391. if you look at the history of those wiki entries, you can see that they continually grow as we find new system setup issues which trigger the exception. That's because I do hit problems, I do fix them myself, and whenever I do that, I add another line. If you've found a new way, once fixed, I encourage you add a new entry. And, at the same time, you are free to change that text at the end. > TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is > failing > --------------------------------------------------------------------------------- > > Key: YARN-4251 > URL: https://issues.apache.org/jira/browse/YARN-4251 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Brahma Reddy Battula > Assignee: Brahma Reddy Battula > Attachments: YARN-4251.patch > > > *Trace* > {noformat} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [0.0.0.0:9030] > java.net.BindException: Address already in use: bind; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Unknown Source) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] > java.net.BindException: Address already in use: bind; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > Source) > at java.lang.reflect.Constructor.newInstance(Unknown Source) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721) > at org.apache.hadoop.ipc.Server.bind(Server.java:486) > at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:646) > at org.apache.hadoop.ipc.Server.<init>(Server.java:2399) > at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:946) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132) > ... 27 more > Caused by: java.net.BindException: Address already in use: bind > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Unknown Source) > at sun.nio.ch.Net.bind(Unknown Source) > at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source) > at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source) > at org.apache.hadoop.ipc.Server.bind(Server.java:469) > ... 35 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)