SPARK 1.1.0 on yarn-cluster and external JARs

2014-09-25 Thread rzykov
We build some SPARK jobs with external jars. I compile jobs by including them in one assembly. But look for an approach to put all external jars into HDFS. We have already put spark jar in a HDFS folder and set up the variable SPARK_JAR. What is the best way to do that for other external jars

Re: SPARK 1.1.0 on yarn-cluster and external JARs

2014-09-25 Thread Egor Pahomov
SparkContext.addJar()? Why you didn't like fat jar way? 2014-09-25 16:25 GMT+04:00 rzykov rzy...@gmail.com: We build some SPARK jobs with external jars. I compile jobs by including them in one assembly. But look for an approach to put all external jars into HDFS. We have already put

Re: SPARK 1.1.0 on yarn-cluster and external JARs

2014-09-25 Thread Marcelo Vanzin
You can pass the HDFS location of those extra jars in the spark-submit --jars argument. Spark will take care of using Yarn's distributed cache to make them available to the executors. Note that you may need to provide the full hdfs URL (not just the path, since that will be interpreted as a local