Re: Spark submit does automatically upload the jar to cluster?

jiml Tue, 29 Dec 2015 11:07:38 -0800

And for more clarification on this:

For non-YARN installs this bug has been filed to make the Spark driver
upload jars <https://issues.apache.org/jira/browse/SPARK-12559>


The point of confusion, that I along with other newcomers commonly suffer
from is this. In non-YARN installs:

*The **driver** does NOT push your jars to the cluster. The **master**
in the cluster DOES push your jars to the **workers**. In theory.*

Thanks to an email response on the email list from Greg Hill for this
clarification, hope he doesn't mind me copying the relevant part here, since
I can't link to it:

" spark-submit does not pass the JAR along to the Driver, but the
Driver will pass it to the executors.  I ended up putting the JAR in HDFS
and passing an hdfs:// path to spark-submit.  This is a subtle difference
from Spark on YARN which does pass the JAR along to the Driver
automatically, and IMO should probably be fixed in spark-submit.  It's
really confusing for newcomers."
That's funny I didn't delete that answer!

I think I have two accounts crossing, here was the answer:

I don't know if this is going to help, but I agree that some of the docs
would lead one to believe that the Spark driver  or master is going to
spread your jars around for you. But there's other docs that seem to
contradict this, esp related to EC2 clusters.

I wrote a Stack Overflow answer dealing with a similar situation, see if it
helps:

http://stackoverflow.com/questions/23687081/spark-workers-unable-to-find-jar-on-ec2-cluster/34502774#34502774

Pay attention to this section about the spark-submit docs:

I must admit, as a limitation on this, it confuses me in the Spark docs that
for spark.executor.extraClassPath it says:

    Users typically should not need to set this option

I assume they mean most people will get the classpath out through a driver
config option. I know most of the docs for spark-submit make it should like
the script handles moving your code around the cluster but I think it only
moves the classpath around for you. For example is this line from  Launching
Applications with spark-submit
<http://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit>
  
explicitly says you have to move the jars yourself or make them "globally
available":

    application-jar: Path to a bundled jar including your application and
all dependencies. The URL must be globally visible inside of your cluster,
for instance, an hdfs:// path or a file:// path that is present on all
nodes.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-does-automatically-upload-the-jar-to-cluster-tp25762p25831.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark submit does automatically upload the jar to cluster?

Reply via email to