[
https://issues.apache.org/jira/browse/YARN-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863509#comment-16863509
]
Eric Yang commented on YARN-9619:
---------------------------------
[~caozhiqiang] Sorry, I am not entirely sure that I understand the description
of this problem. This seems to indicate mapreduce workload doesn't work with
bridge network in overlay network. YARN framework requires application master
to run in the same flat network as resource manager and node manager. This
ensures bi-directional communication between application master and YARN
framework are not blocked.
Overlay network implies some level of privacy from host network level. Overlay
network often allows only outbound network access. By running application
master in overlay network, resource manager and node manager can not have
bi-directional communication with application master.
I don't think it is possible to run AM in docker in current implementation of
YARN.
> Transfer error AM host/ip when launching app using docker container with
> bridge network
> ---------------------------------------------------------------------------------------
>
> Key: YARN-9619
> URL: https://issues.apache.org/jira/browse/YARN-9619
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 3.3.0
> Reporter: caozhiqiang
> Priority: Major
>
> When launching application using docker container with bridge network in
> overlay networks, client will polling the rate of application process from
> ApplicationMaster with error host/IP. client also polling from the
> nodemanager's hostname/IP, but not from the docker's IP which AM real running
> in. The error message is below(the server hadoop3-1/192.168.2.105 is NM's,
> not AM's docker IP, so it can't be accessed):
> 2019-05-11 08:28:46,361 INFO ipc.Client: Retrying connect to server:
> hadoop3-1/192.168.2.105:37963. Already tried 0 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
> 2019-05-11 08:28:47,363 INFO ipc.Client: Retrying connect to server:
> hadoop3-1/192.168.2.105:37963. Already tried 1 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
> 2019-05-11 08:28:48,365 INFO ipc.Client: Retrying connect to server:
> hadoop3-1/192.168.2.105:37963. Already tried 2 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
> 2019-05-10 08:34:40,235 INFO mapred.ClientServiceDelegate: Application state
> is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
> 2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server:
> 0.0.0.0/0.0.0.0:12020. Already tried 8 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> 2019-05-10 08:35:00,408 INFO ipc.Client: Retrying connect to server:
> 0.0.0.0/0.0.0.0:12020. Already tried 9 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> java.io.IOException: java.net.ConnectException: Your endpoint configuration
> is wrong; For more details see:
> http://wiki.apache.org/hadoop/UnsetHostnameOrPort
> at
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:345)
> at
> org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:430)
> at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:871)
> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:331)
> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:328)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:328)
> at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:612)
> at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1629)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591)
> at
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:307)
> at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> Caused by: java.net.ConnectException: Your endpoint configuration is wrong;
> For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort
> at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
> at org.apache.hadoop.ipc.Client.call(Client.java:1457)
> at org.apache.hadoop.ipc.Client.call(Client.java:1367)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy14.getJobReport(Unknown Source)
> at
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:326)
> ... 28 more
> Caused by: java.net.ConnectException: 拒绝连接
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
> at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
> at org.apache.hadoop.ipc.Client.call(Client.java:1403)
> ... 37 more
>
> In AM register to RM's code, RMCommunicator::register(), I try to use
> "request.setHost(InetAddress.getLocalHost().getHostAddress());" to get the
> docker's IP, but it also doesn't work.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]