Hi all,
We're trying to submit a python file, pi.py in this case, to yarn from java
code but this kept failing(1.6.0).
It seems the AM uses the arguments we passed to pi.py as the driver IP
address.
Could someone help me figuring out how to get the job done. Thanks in
advance.
The java code looks like below:
String[] args = new String[]{
"--name",
"Test Submit Python To Yarn From Java",
"--primary-py-file",
SPARK_HOME + "/examples/src/main/python/pi.py",
"--num-executors",
"5",
"--driver-memory",
"512m",
"--executor-memory",
"512m",
"--executor-cores",
"1",
"--arg",
args[0]
};
Configuration config = new Configuration();
SparkConf sparkConf = new SparkConf();
ClientArguments clientArgs = new ClientArguments(args, sparkConf
);
Client client = new Client(clientArgs, config, sparkConf);
client.run();
The jar is submitted by spark-submit::
./bin/spark-submit --class SubmitPyYARNJobFromJava --master yarn-client
TestSubmitPythonFromJava.jar 10
The job submit to yarn just stay in ACCEPTED before it failed
What I can't figure out is, yarn log shows AM couldn't reach the driver at
10:0, which is my argument passed to pi.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/data/1/yarn/local/usercache/root/filecache/2084/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/03/15 17:54:44 INFO yarn.ApplicationMaster: Registered signal handlers
for [TERM, HUP, INT]
16/03/15 17:54:45 INFO yarn.ApplicationMaster: ApplicationAttemptId:
appattempt_1458023046377_0499_000001
16/03/15 17:54:45 INFO spark.SecurityManager: Changing view acls to:
yarn,root
16/03/15 17:54:45 INFO spark.SecurityManager: Changing modify acls to:
yarn,root
16/03/15 17:54:45 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions: Set
(yarn, root); users with modify permissions: Set(yarn, root)
16/03/15 17:54:45 INFO yarn.ApplicationMaster: Waiting for Spark driver to
be reachable.
16/03/15 17:54:45 ERROR yarn.ApplicationMaster: Failed to connect to driver
at 10:0, retrying ...
16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to driver
at 10:0, retrying ...
16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to driver
at 10:0, retrying ...
.........
16/03/15 17:56:25 ERROR yarn.ApplicationMaster: Failed to connect to driver
at 10:0, retrying ...
16/03/15 17:56:26 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at
org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver
(ApplicationMaster.scala:484)
at
org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher
(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster.run
(ApplicationMaster.scala:187)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun
$main$1.apply$mcV$sp(ApplicationMaster.scala:653)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run
(SparkHadoopUtil.scala:69)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run
(SparkHadoopUtil.scala:68)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1628)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser
(SparkHadoopUtil.scala:68)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main
(ApplicationMaster.scala:651)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main
(ApplicationMaster.scala:674)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main
(ApplicationMaster.scala)
16/03/15 17:56:26 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException:
Failed to connect to driver!)
16/03/15 17:56:26 INFO util.ShutdownHookManager: Shutdown hook called
Best regards,
S.Y. Chung 鍾學毅
F14MITD
Taiwan Semiconductor Manufacturing Company, Ltd.
Tel: 06-5056688 Ext: 734-6325
---------------------------------------------------------------------------
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited. If you are not the intended recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------