And for more clarification on this: For non-YARN installs this bug has been filed to make the Spark driver upload jars <https://issues.apache.org/jira/browse/SPARK-12559>
The point of confusion, that I along with other newcomers commonly suffer from is this. In non-YARN installs: *The **driver** does NOT push your jars to the cluster. The **master** in the cluster DOES push your jars to the **workers**. In theory.* Thanks to an email response on the email list from Greg Hill for this clarification, hope he doesn't mind me copying the relevant part here, since I can't link to it: " spark-submit does not pass the JAR along to the Driver, but the Driver will pass it to the executors. I ended up putting the JAR in HDFS and passing an hdfs:// path to spark-submit. This is a subtle difference from Spark on YARN which does pass the JAR along to the Driver automatically, and IMO should probably be fixed in spark-submit. It's really confusing for newcomers." That's funny I didn't delete that answer! I think I have two accounts crossing, here was the answer: I don't know if this is going to help, but I agree that some of the docs would lead one to believe that the Spark driver or master is going to spread your jars around for you. But there's other docs that seem to contradict this, esp related to EC2 clusters. I wrote a Stack Overflow answer dealing with a similar situation, see if it helps: http://stackoverflow.com/questions/23687081/spark-workers-unable-to-find-jar-on-ec2-cluster/34502774#34502774 Pay attention to this section about the spark-submit docs: I must admit, as a limitation on this, it confuses me in the Spark docs that for spark.executor.extraClassPath it says: Users typically should not need to set this option I assume they mean most people will get the classpath out through a driver config option. I know most of the docs for spark-submit make it should like the script handles moving your code around the cluster but I think it only moves the classpath around for you. For example is this line from Launching Applications with spark-submit <http://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit> explicitly says you have to move the jars yourself or make them "globally available": application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-does-automatically-upload-the-jar-to-cluster-tp25762p25831.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org