Hi Gerhard,

I just stumbled upon some documentation on EMR - link below. Seems there is
a -u option to add jars in S3 to your classpath, have you tried that ?

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-spark-configure.html


Best Regards,
Sonal
Founder, Nube Technologies <http://www.nubetech.co>
Reifier at Strata Hadoop World <https://www.youtube.com/watch?v=eD3LkpPQIgM>
Reifier at Spark Summit 2015
<https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/>

<http://in.linkedin.com/in/sonalgoyal>



On Wed, Mar 9, 2016 at 11:50 AM, Wang, Daoyuan <[email protected]>
wrote:

> Hi Gerhard,
>
>
>
> How does EMR set its conf for spark? I think if you set SPARK_CLASSPATH
> and spark.dirver.extraClassPath, spark would ignore SPARK_CLASSPATH.
>
> I think you can do this by read the configuration from SparkConf, and then
> add your custom settings to the corresponding key, and use the updated
> SparkConf to instantiate your SparkContext.
>
>
>
> Thanks,
>
> Daoyuan
>
>
>
> *From:* Gerhard Fiedler [mailto:[email protected]]
> *Sent:* Wednesday, March 09, 2016 5:41 AM
> *To:* [email protected]
> *Subject:* How to add a custom jar file to the Spark driver?
>
>
>
> We’re running Spark 1.6.0 on EMR, in YARN client mode. We run Python code,
> but we want to add a custom jar file to the driver.
>
>
>
> When running on a local one-node standalone cluster, we just use
> spark.driver.extraClassPath and everything works:
>
>
>
> spark-submit --conf spark.driver.extraClassPath=/path/to/our/custom/jar/*
>  our-python-script.py
>
>
>
> But on EMR, this value is set to something that is needed to make their
> installation of Spark work. Setting it to point to our custom jar
> overwrites the original setting rather than adding to it and breaks Spark.
>
>
>
> Our current workaround is to capture to whatever EMR sets
> spark.driver.extraClassPath once, then use that path and add our jar file
> to it. Of course this breaks when EMR changes this path in their cluster
> settings. We wouldn’t necessarily notice this easily. This is how it looks:
>
>
>
> spark-submit --conf
> spark.driver.extraClassPath=/path/to/our/custom/jar/*:/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
>  our-python-script.py
>
>
>
> We prefer not to do this…
>
>
>
> We tried the spark-submit argument --jars, but it didn’t seem to do
> anything. Like this:
>
>
>
> spark-submit --jars /path/to/our/custom/jar/file.jar  our-python-script.py
>
>
>
> We also tried to set CLASSPATH, but it doesn’t seem to have any impact:
>
>
>
> export CLASSPATH=/path/to/our/custom/jar/*
>
> spark-submit  our-python-script.py
>
>
>
> When using SPARK_CLASSPATH, we got warnings that it is deprecated, and the
> messages also seemed to imply that it affects the same configuration that
> is set by spark.driver.extraClassPath.
>
>
>
>
>
> So, my question is: Is there a clean way to add a custom jar file to a
> Spark configuration?
>
>
>
> Thanks,
>
> Gerhard
>
>
>

Reply via email to