[ 
https://issues.apache.org/jira/browse/YARN-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863616#comment-16863616
 ] 

caozhiqiang commented on YARN-9619:
-----------------------------------

Thanks for your comments, [~eyang]. Launching application with docker cantainer 
allow several kinds networks. In document it has declared that it both support 
allowed host network and bridge network.[launch with 
docker|[https://hadoop.apache.org/docs/r3.1.2/hadoop-yarn/hadoop-yarn-site/DockerContainers.html]]
{code:java}
//   <property>
    
<name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
    <value>host,none,bridge</value>
    <description>
      Optional. A comma-separated set of networks allowed when launching
      containers. Valid values are determined by Docker networks available from
      `docker network ls`
    </description>
  </property>{code}
With host network, AM running in docker can work well because the AM's IP is 
the same with NM's.

With bridge network, I think if AM register correct host/IP(the real docker 
container IP, not nodemanager IP) to RM, and all hadoop components running in 
overlay network, such as deploying flannel, it should also work well. 

In overlay network, docker can bi-directional communication with any other 
docker or other nodes. So RM and NMs can also bi-directional communication with 
AM running in docker. I have verified these.

> Transfer error AM host/ip when launching app using docker container with 
> bridge network
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-9619
>                 URL: https://issues.apache.org/jira/browse/YARN-9619
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.3.0
>            Reporter: caozhiqiang
>            Priority: Major
>
> When launching application using docker container with bridge network in 
> overlay networks, client will polling the rate of application process from 
> ApplicationMaster with error host/IP. client also polling from the 
> nodemanager's hostname/IP, but not from the docker's IP which AM real running 
> in. The error message is below(the server hadoop3-1/192.168.2.105 is NM's, 
> not AM's docker IP, so it can't be accessed):
> 2019-05-11 08:28:46,361 INFO ipc.Client: Retrying connect to server: 
> hadoop3-1/192.168.2.105:37963. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
> 2019-05-11 08:28:47,363 INFO ipc.Client: Retrying connect to server: 
> hadoop3-1/192.168.2.105:37963. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
> 2019-05-11 08:28:48,365 INFO ipc.Client: Retrying connect to server: 
> hadoop3-1/192.168.2.105:37963. Already tried 2 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
> 2019-05-10 08:34:40,235 INFO mapred.ClientServiceDelegate: Application state 
> is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
> 2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:12020. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:12020. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> java.io.IOException: java.net.ConnectException: Your endpoint configuration 
> is wrong; For more details see: 
> http://wiki.apache.org/hadoop/UnsetHostnameOrPort
>  at 
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:345)
>  at 
> org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:430)
>  at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:871)
>  at org.apache.hadoop.mapreduce.Job$1.run(Job.java:331)
>  at org.apache.hadoop.mapreduce.Job$1.run(Job.java:328)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:328)
>  at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:612)
>  at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1629)
>  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591)
>  at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:307)
>  at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>  at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>  at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> Caused by: java.net.ConnectException: Your endpoint configuration is wrong; 
> For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort
>  at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
>  at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1367)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>  at com.sun.proxy.$Proxy14.getJobReport(Unknown Source)
>  at 
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)
>  at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:326)
>  ... 28 more
> Caused by: java.net.ConnectException: 拒绝连接
>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>  at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
>  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
>  at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>  ... 37 more
>  
> In AM register to RM's code, RMCommunicator::register(), I try to use 
> "request.setHost(InetAddress.getLocalHost().getHostAddress());" to get the 
> docker's IP, but it also doesn't work. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to