I am pretty new to Spark and I am trying to run the spark shell on a Yarn 
cluster from the cli (in yarn-client mode). I am able to start the shell with 
the following command:

SPARK_JAR=../spark-0.9.0-incubating/jars/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar
 \ SPARK_YARN_APP_JAR=emptyfile \
SPARK_WORKER_INSTANCES=10 \
SPARK_YARN_QUEUE=hdmi-others \
MASTER=yarn-client \
ADD_JARS="lib/avro-mapred-1.7.6-hadoop2.jar,schemas-java.jar" \
SPARK_CLASSPATH="lib/avro-mapred-1.7.6-hadoop2.jar:schemas-java.jar" \
SPARK_WORKER_MEMORY=512M \
SPARK_MASTER_MEMORY=512M \
../spark-0.9.0-incubating/bin/spark-shell

However, as soon as I try to execute an action that requires workers to execute 
on the cluster's machines I get this WARN message in the spark shell:

14/04/08 17:55:59 WARN YarnClientClusterScheduler: Initial job has not accepted 
any resources; check your cluster UI to ensure that workers are registered and 
have sufficient memory

When I look a the log of my application master on the webUI I see this:

14/04/08 17:55:09 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/04/08 17:55:09 INFO Remoting: Starting remoting
14/04/08 17:55:10 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkYarnAM@<i>machine</i>:35023]
14/04/08 17:55:10 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://sparkYarnAM@<i>machine</i>:35023]
14/04/08 17:55:11 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
14/04/08 17:55:11 INFO client.RMProxy: Connecting to ResourceManager at 
<i>cluster</i>:8030
14/04/08 17:55:11 INFO yarn.WorkerLauncher: ApplicationAttemptId: 
appattempt_1394582929977_173315_000001
14/04/08 17:55:11 INFO yarn.WorkerLauncher: Registering the ApplicationMaster
14/04/08 17:55:11 INFO yarn.WorkerLauncher: Waiting for Spark driver to be 
reachable.
**14/04/08 17:56:14 ERROR yarn.WorkerLauncher: Failed to connect to driver at 
<i>machine</i>:59281, retrying ...
14/04/08 17:57:17 ERROR yarn.WorkerLauncher: Failed to connect to driver at 
<i>machine</i>:59281, retrying ...
14/04/08 17:58:20 ERROR yarn.WorkerLauncher: Failed to connect to driver at 
<i>machine</i>:59281, retrying ...

It looks like the master cannot connect to a worker, but I have no idea why 
this is happening.
I didn't find any answer to this issue in the forum. Any idea of what the 
problem may be?

Thanks,
Marco

Reply via email to