[ 
https://issues.apache.org/jira/browse/YARN-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863509#comment-16863509
 ] 

Eric Yang commented on YARN-9619:
---------------------------------

[~caozhiqiang] Sorry, I am not entirely sure that I understand the description 
of this problem.  This seems to indicate mapreduce workload doesn't work with 
bridge network in overlay network.  YARN framework requires application master 
to run in the same flat network as resource manager and node manager.  This 
ensures bi-directional communication between application master and YARN 
framework are not blocked.  

Overlay network implies some level of privacy from host network level.  Overlay 
network often allows only outbound network access.  By running application 
master in overlay network, resource manager and node manager can not have 
bi-directional communication with application master.

I don't think it is possible to run AM in docker in current implementation of 
YARN.

> Transfer error AM host/ip when launching app using docker container with 
> bridge network
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-9619
>                 URL: https://issues.apache.org/jira/browse/YARN-9619
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.3.0
>            Reporter: caozhiqiang
>            Priority: Major
>
> When launching application using docker container with bridge network in 
> overlay networks, client will polling the rate of application process from 
> ApplicationMaster with error host/IP. client also polling from the 
> nodemanager's hostname/IP, but not from the docker's IP which AM real running 
> in. The error message is below(the server hadoop3-1/192.168.2.105 is NM's, 
> not AM's docker IP, so it can't be accessed):
> 2019-05-11 08:28:46,361 INFO ipc.Client: Retrying connect to server: 
> hadoop3-1/192.168.2.105:37963. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
> 2019-05-11 08:28:47,363 INFO ipc.Client: Retrying connect to server: 
> hadoop3-1/192.168.2.105:37963. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
> 2019-05-11 08:28:48,365 INFO ipc.Client: Retrying connect to server: 
> hadoop3-1/192.168.2.105:37963. Already tried 2 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
> 2019-05-10 08:34:40,235 INFO mapred.ClientServiceDelegate: Application state 
> is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
> 2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:12020. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:12020. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> java.io.IOException: java.net.ConnectException: Your endpoint configuration 
> is wrong; For more details see: 
> http://wiki.apache.org/hadoop/UnsetHostnameOrPort
>  at 
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:345)
>  at 
> org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:430)
>  at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:871)
>  at org.apache.hadoop.mapreduce.Job$1.run(Job.java:331)
>  at org.apache.hadoop.mapreduce.Job$1.run(Job.java:328)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:328)
>  at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:612)
>  at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1629)
>  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591)
>  at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:307)
>  at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>  at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>  at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> Caused by: java.net.ConnectException: Your endpoint configuration is wrong; 
> For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort
>  at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
>  at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1367)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>  at com.sun.proxy.$Proxy14.getJobReport(Unknown Source)
>  at 
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)
>  at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:326)
>  ... 28 more
> Caused by: java.net.ConnectException: 拒绝连接
>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>  at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
>  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
>  at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>  ... 37 more
>  
> In AM register to RM's code, RMCommunicator::register(), I try to use 
> "request.setHost(InetAddress.getLocalHost().getHostAddress());" to get the 
> docker's IP, but it also doesn't work. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to