I am pretty new to Spark and I am trying to run the spark shell on a Yarn cluster from the cli (in yarn-client mode). I am able to start the shell with the following command:
SPARK_JAR=../spark-0.9.0-incubating/jars/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar \ SPARK_YARN_APP_JAR=emptyfile \ SPARK_WORKER_INSTANCES=10 \ SPARK_YARN_QUEUE=hdmi-others \ MASTER=yarn-client \ ADD_JARS="lib/avro-mapred-1.7.6-hadoop2.jar,schemas-java.jar" \ SPARK_CLASSPATH="lib/avro-mapred-1.7.6-hadoop2.jar:schemas-java.jar" \ SPARK_WORKER_MEMORY=512M \ SPARK_MASTER_MEMORY=512M \ ../spark-0.9.0-incubating/bin/spark-shell However, as soon as I try to execute an action that requires workers to execute on the cluster's machines I get this WARN message in the spark shell: 14/04/08 17:55:59 WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory When I look a the log of my application master on the webUI I see this: 14/04/08 17:55:09 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/04/08 17:55:09 INFO Remoting: Starting remoting 14/04/08 17:55:10 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkYarnAM@<i>machine</i>:35023] 14/04/08 17:55:10 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkYarnAM@<i>machine</i>:35023] 14/04/08 17:55:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/04/08 17:55:11 INFO client.RMProxy: Connecting to ResourceManager at <i>cluster</i>:8030 14/04/08 17:55:11 INFO yarn.WorkerLauncher: ApplicationAttemptId: appattempt_1394582929977_173315_000001 14/04/08 17:55:11 INFO yarn.WorkerLauncher: Registering the ApplicationMaster 14/04/08 17:55:11 INFO yarn.WorkerLauncher: Waiting for Spark driver to be reachable. **14/04/08 17:56:14 ERROR yarn.WorkerLauncher: Failed to connect to driver at <i>machine</i>:59281, retrying ... 14/04/08 17:57:17 ERROR yarn.WorkerLauncher: Failed to connect to driver at <i>machine</i>:59281, retrying ... 14/04/08 17:58:20 ERROR yarn.WorkerLauncher: Failed to connect to driver at <i>machine</i>:59281, retrying ... It looks like the master cannot connect to a worker, but I have no idea why this is happening. I didn't find any answer to this issue in the forum. Any idea of what the problem may be? Thanks, Marco