Thanks for your answer, you are correct, it's just a different approach
than the one I am asking for :)
Building an uber- or assembly- jar goes against the idea of placing the
jars on all workers. Uber-jars increase network traffic, using local:/
in the classpath reduces network traffic.
Eventually, depending on uber-jars can run into various problems.
Really the question is narrowly geared toward understand what arguments
can setup the classpath using the --jars argument. Using an uber-jar is
a workaround, true, but with downsides.
Thanks!
On 01/12/2016 12:06 AM, UMESH CHAUDHARY wrote:
Could you build a fat jar by including all your dependencies along
with you application. See here
<http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management> and
here
<http://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies> .
Also:
/*So this application-jar can point to a directory and will be
expanded? Or
needs to be a path to a single specific jar?*/
/
/
*This will be path to a single specific JAR.*
On Tue, Jan 12, 2016 at 12:04 PM, jiml <j...@megalearningllc.com
<mailto:j...@megalearningllc.com>> wrote:
Question is: Looking for all the ways to specify a set of jars
using --jars
on spark-submit
I know this is old but I am about to submit a proposed docs change on
--jars, and I had an issue with --jars today
When this user submitted the following command line, is that a
proper way to
reference a jar?
hdfs://master:8000/srcdata/kmeans (is that a directory? or a jar that
doesn't end with .jar? I have not gotten into the machine learning
libs yet
to recognize this)
I know the docs say, "Path to a bundled jar including your
application and
all dependencies. The URL must be globally visible inside of your
cluster,
for instance, an hdfs:// path or a file:// path that is present on all
nodes."
*So this application-jar can point to a directory and will be
expanded? Or
needs to be a path to a single specific jar?*
I ask because when I was testing --jars today, we had to
explicitly provide
a path to each jar:
//usr/local/spark/bin/spark-submit --class
jpsgcs.thold.PipeLinkageData
---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar
/usr/local/spark/jars/thold-0.0.1-1.jar/
(The only way I figured out to use the commas was a StackOverflow
answer
that led me to look beyond the docs to the command line:
spark-submit --help
results in :
--jars JARS Comma-separated list of local jars to
include
on the driver
and executor classpaths.
And it seems that we do not need to put the main jar in the --jars
argument,
I have not tested yet if other classes in the application-jar
(/usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers,
or if I
need to put the application-jar in the --jars path to get classes
not named
after --class to be seen?
Thanks for any ideas
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-submit-multiple-jar-files-when-using-spark-submit-script-in-shell-tp16662p25942.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>