[ 
https://issues.apache.org/jira/browse/YARN-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974013#comment-14974013
 ] 

Steve Loughran commented on YARN-4251:
--------------------------------------

bq. Also the dismissive nature of the wiki: 
"Finally, this is not a Hadoop problem, it is a host, network or Hadoop 
configuration problem. As it is your cluster, only you can find out and track 
down the problem.. Sorry"
bq. Everything worked fine one day. I upgrade hadoop it stops working. The wiki 
ends with a bold claim that every bind exception that starts the day after 
upgrade is not a hadoop problem.

Edward, I an assure you that most of the JIRAs we get related to: 
ConnectionRefused, BindException, NoRouteToHostException,...etc are related to 
system configs. it is almost invariably some machine config issue, be it ubuntu 
mapping localhost to 127.0.1.1; a firewall in the way, rDNS broken, or tothers. 
 And we get so many complaining that the namenode is refusing connections, when 
either the firewall is up, the port settings for the client are wrong, the 
hostname is wrong or the NN isn't up. Same for BindException. 

We've gone to the effort of adding wrappers around all socket exceptions to add 
in hostnames and ports (the things people who understand networking need), and 
wiki entries to help people fend for themselves and not file Critical issues 
about problems that they generally have to fix for themselves. Yet even with 
those exceptions saying "look at the wiki" entry, we still get people not 
following the link, but going straight to JIRA: HADOOP-12391. 

if you look at the history of those wiki entries, you can see that they 
continually grow as we find new system setup issues which trigger the 
exception.  That's because I do hit problems, I do fix them myself, and 
whenever I do that, I add another line. If you've found a new way, once fixed, 
I encourage you add a new entry. And, at the same time, you are free to change 
that text at the end. 

> TestAMRMClientOnRMRestart#testAMRMClientOnAMRMTokenRollOverOnRMRestart is 
> failing
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-4251
>                 URL: https://issues.apache.org/jira/browse/YARN-4251
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>         Attachments: YARN-4251.patch
>
>
>  *Trace* 
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>       at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>       at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
>       at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:143)
>       at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>       at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:592)
>       at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:975)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1016)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Unknown Source)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1012)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1052)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyResourceManager.serviceStart(TestAMRMClientOnRMRestart.java:560)
>       at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientOnAMRMTokenRollOverOnRMRestart(TestAMRMClientOnRMRestart.java:463)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>       at java.lang.reflect.Method.invoke(Unknown Source)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:9030] 
> java.net.BindException: Address already in use: bind; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
>       at java.lang.reflect.Constructor.newInstance(Unknown Source)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
>       at org.apache.hadoop.ipc.Server.bind(Server.java:486)
>       at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:646)
>       at org.apache.hadoop.ipc.Server.<init>(Server.java:2399)
>       at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:946)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:537)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
>       at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
>       at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
>       at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
>       ... 27 more
> Caused by: java.net.BindException: Address already in use: bind
>       at sun.nio.ch.Net.bind0(Native Method)
>       at sun.nio.ch.Net.bind(Unknown Source)
>       at sun.nio.ch.Net.bind(Unknown Source)
>       at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
>       at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source)
>       at org.apache.hadoop.ipc.Server.bind(Server.java:469)
>       ... 35 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to