Hello everybody, I am trying to figure out how to submit a Spark application from one separate physical machine to a Spark stand alone cluster. I have an application that I wrote in Python that works if I am on the 1-Node Spark server itself, and from that spark installation I run bin/spark-submit with 1) MASTER=local[*] or if 2) MASTER=spark://localhost:7077.
However, I want to be on a separate machine that submits a job to Spark. Am I doing something wrong here? I think something is wrong because I am working from two different spark "installations" -- as in, on the big server I have one spark installation and I am running sbin/start-all.sh to run the standalone server (and that works), and then on a separate laptop I have a different installation of spark-1.0.0, but I am using the laptop's bin/spark-submit script to submit to the remote Spark server (using MASTER=spark://<remote-spark-master>:7077 This "submit-to-remote cluster" does not work, even for the Scala examples like SparkPi. Concrete Example: I want to do submit the example SparkPi to the cluster, from my laptop. Server is 10.20.10.152, running master and slave, I can look at the Master web UI at http://10.20.10.152:8080. Great. >From laptop (10.20.10.154), I try the following, using bin/run-example from a locally built version of spark 1.0.0 (so that I have the script spark-submit!): bin/spark-submit --verbose --class org.apache.spark.examples.SparkPi --master spark://10.20.10.152:7077 examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar This fails, with the errors at the bottom of this email. Am I doing something wrong? How can I submit to a remote cluster? I get the same problem with bin/spark-submit. bin/spark-submit --verbose --class org.apache.spark.examples.SparkPi --master spark://10.20.10.152:7077 examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar Using properties file: null Using properties file: null Parsed arguments: master spark://10.20.10.152:7077 deployMode null executorMemory null executorCores null totalExecutorCores null propertiesFile null driverMemory null driverCores null driverExtraClassPath null driverExtraLibraryPath null driverExtraJavaOptions null supervise false queue null numExecutors null files null pyFiles null archives null mainClass org.apache.spark.examples.SparkPi primaryResource file:/Users/aris.vlasakakis/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar name org.apache.spark.examples.SparkPi childArgs [] jars null verbose true Default properties from null: Using properties file: null Main class: org.apache.spark.examples.SparkPi Arguments: System properties: SPARK_SUBMIT -> true spark.app.name -> org.apache.spark.examples.SparkPi spark.jars -> file:/Users/aris.vlasakakis/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar spark.master -> spark://10.20.10.152:7077 Classpath elements: file:/Users/aris.vlasakakis/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar 14/07/09 16:16:08 INFO SecurityManager: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/09 16:16:08 INFO SecurityManager: Changing view acls to: aris.vlasakakis 14/07/09 16:16:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aris.vlasakakis) 14/07/09 16:16:08 INFO Slf4jLogger: Slf4jLogger started 14/07/09 16:16:08 INFO Remoting: Starting remoting 14/07/09 16:16:08 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@10.20.10.154:50478] 14/07/09 16:16:08 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@10.20.10.154:50478] 14/07/09 16:16:08 INFO SparkEnv: Registering MapOutputTracker 14/07/09 16:16:08 INFO SparkEnv: Registering BlockManagerMaster 14/07/09 16:16:08 INFO DiskBlockManager: Created local directory at /var/folders/ch/yfyhs7px5h90505g4n21n8d5k3svt3/T/spark-local-20140709161608-0531 14/07/09 16:16:08 INFO MemoryStore: MemoryStore started with capacity 5.8 GB. 14/07/09 16:16:08 INFO ConnectionManager: Bound socket to port 50479 with id = ConnectionManagerId(10.20.10.154,50479) 14/07/09 16:16:08 INFO BlockManagerMaster: Trying to register BlockManager 14/07/09 16:16:08 INFO BlockManagerInfo: Registering block manager 10.20.10.154:50479 with 5.8 GB RAM 14/07/09 16:16:08 INFO BlockManagerMaster: Registered BlockManager 14/07/09 16:16:08 INFO HttpServer: Starting HTTP Server 14/07/09 16:16:09 INFO HttpBroadcast: Broadcast server started at http://10.20.10.154:50480 14/07/09 16:16:09 INFO HttpFileServer: HTTP File server directory is /var/folders/ch/yfyhs7px5h90505g4n21n8d5k3svt3/T/spark-edd787f4-f606-473c-965b-9f3b131cfb43 14/07/09 16:16:09 INFO HttpServer: Starting HTTP Server 14/07/09 16:16:09 INFO SparkUI: Started SparkUI at http://10.20.10.154:4040 2014-07-09 16:16:09.439 java[37388:d17] Unable to load realm mapping info from SCDynamicStore 14/07/09 16:16:09 INFO SparkContext: Added JAR file:/Users/aris.vlasakakis/Documents/spark-1.0.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop1.0.4.jar at http://10.20.10.154:50481/jars/spark-examples-1.0.0-hadoop1.0.4.jar with timestamp 1404947769853 14/07/09 16:16:09 INFO AppClient$ClientActor: Connecting to master spark://10.20.10.152:7077... 14/07/09 16:16:09 INFO SparkContext: Starting job: reduce at SparkPi.scala:35 14/07/09 16:16:09 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false) 14/07/09 16:16:09 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35) 14/07/09 16:16:10 INFO DAGScheduler: Parents of final stage: List() 14/07/09 16:16:10 INFO DAGScheduler: Missing parents: List() 14/07/09 16:16:10 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no missing parents 14/07/09 16:16:10 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31) 14/07/09 16:16:10 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20140709161541-0001 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/0 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/0 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/0 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/0 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/0 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/1 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/1 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/1 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/1 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/1 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/2 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/2 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/2 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/2 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/2 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/3 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/3 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/3 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/3 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/3 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/4 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/4 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/4 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/4 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/4 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/5 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/5 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/5 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/5 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/5 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/6 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/6 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/6 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/6 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/6 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/7 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/7 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/7 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/7 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/7 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/8 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/8 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/8 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/8 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/8 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor added: app-20140709161541-0001/9 on worker-20140709160420-10.20.10.152-57674 ( 10.20.10.152:57674) with 4 cores 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140709161541-0001/9 on hostPort 10.20.10.152:57674 with 4 cores, 4.0 GB RAM 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/9 is now RUNNING 14/07/09 16:16:10 INFO AppClient$ClientActor: Executor updated: app-20140709161541-0001/9 is now FAILED (class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory) 14/07/09 16:16:10 INFO SparkDeploySchedulerBackend: Executor app-20140709161541-0001/9 removed: class java.io.IOException: Cannot run program "/Users/aris.vlasakakis/Documents/spark-1.0.0/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory 14/07/09 16:16:10 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED 14/07/09 16:16:10 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/07/09 16:16:10 INFO DAGScheduler: Failed to run reduce at SparkPi.scala:35 Exception in thread "main" 14/07/09 16:16:10 INFO TaskSchedulerImpl: Cancelling stage 0 org.apache.spark.SparkException: Job aborted due to stage failure: Master removed our application: FAILED at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- Άρης Βλασακάκης Aris Vlasakakis