Zeppelin is now supported on EMR in release emr-4.1.0 without the need for any bootstrap action like this. See https://aws.amazon.com/blogs/aws/amazon-emr-release-4-1-0-spark-1-5-0-hue-3-7-1-hdfs-encryption-presto-oozie-zeppelin-improved-resizing/ for the emr-4.1.0 announcement.
BTW, the version of Zeppelin bundled with emr-4.1.0 is a SNAPSHOT version of 0.6.0, built from commit a345f768471e9b8c89f4eb4d3aba6b684bff75b3. ~ Jonathan On Wed, Sep 30, 2015 at 2:27 AM, Ophir Cohen <oph...@gmail.com> wrote: > Did anyone else encountered that problem? > > I removed the *--driver-class-path "${CLASSPATH}"* from > bin/interpreter.sh script and now it start the SparkContext as expected. > The problem is that it does not grab my local hive-site.xml that pointed > to an external metastore and try to use the local one :( > > On Fri, Sep 18, 2015 at 4:14 PM, Eugene <blackorange...@gmail.com> wrote: > >> Hi Anders, >> >> I also had the error you mention, overcame this with: >> >> 1. using spark installation from zeppelin >> 2. altering conf/interpreter.json with properties like >> "spark.executor.instances", "spark.executor.cores", >> "spark.default.parallelism" from spark-defaults.conf, parsed this file >> using parts of your gist. >> >> Code looks like this: >> >> cd ~/zeppelin/conf/ >> SPARK_DEFAULTS=~/emr-spark-defaults.conf >> SPARK_EXECUTOR_INSTANCES=$(grep spark.executor.instances $SPARK_DEFAULTS >> | awk '{print $2}') >> SPARK_EXECUTOR_CORES=$(grep spark.executor.cores $SPARK_DEFAULTS | awk >> '{print $2}') >> SPARK_EXECUTOR_MEMORY=$(grep spark.executor.memory $SPARK_DEFAULTS | awk >> '{print $2}') >> SPARK_DEFAULT_PARALLELISM=$(grep spark.default.parallelism >> $SPARK_DEFAULTS | awk '{print $2}') >> cat interpreter.json | jq >> ".interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.instances\" >> = \"${SPARK_EXECUTOR_INSTANCES}\" | >> .interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.cores\" = >> \"${SPARK_EXECUTOR_CORES}\" | >> .interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.memory\" = >> \"${SPARK_EXECUTOR_MEMORY}\" | >> .interpreterSettings.\"2B188AQ5T\".properties.\"spark.default.parallelism\" >> = \"${SPARK_DEFAULT_PARALLELISM}\" " > interpreter.json_ >> cat interpreter.json_ > interpreter.json >> rm interpreter.json_ >> >> >> 2015-09-18 17:05 GMT+04:00 Anders Hammar <anders.ham...@gmail.com>: >> >>> Hi, >>> >>> Thank you Phil for updating my script to support the latest version of >>> EMR. >>> I have edited my gist so that it includes some of your updates plus >>> added some other additional changes. >>> >>> https://gist.github.com/andershammar/224e1077021d0ea376dd >>> >>> While on the subject, has anyone be able to get Zeppelin to work >>> together with the Amazon's Spark installation on Amazon EMR 4.x (by >>> exporting SPARK_HOME and HADOOP_HOME instead)? When I try this then I get >>> the following exception: >>> >>> org.apache.spark.SparkException: Found both spark.driver.extraClassPath >>> and SPARK_CLASSPATH. Use only the former. >>> at >>> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:444) >>> at >>> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:442) >>> at scala.collection.immutable.List.foreach(List.scala:318) >>> at >>> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:442) >>> at >>> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:430) >>> at scala.Option.foreach(Option.scala:236) >>> at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:430) >>> ... >>> >>> From a quick look at it, the problem seems to be that the Amazon >>> installation of Spark use SPARK_CLASSPATH to add additional libraries >>> (/etc/spark/conf/spark-env.sh) while the Zeppelin use "spark-submit >>> --driver-class-path" (zeppelin/bin/interpreter.sh). >>> >>> Any ideas? >>> >>> Best regards, >>> Anders >>> >>> >>> On Wed, Sep 9, 2015 at 5:09 PM, Eugene <blackorange...@gmail.com> wrote: >>> >>>> Here's a bit shorter alternative, too >>>> >>>> https://gist.github.com/snowindy/008f3e8b878a23c00679 >>>> >>>> 2015-09-09 18:58 GMT+04:00 shahab <shahab.mok...@gmail.com>: >>>> >>>>> Thanks Phil, it works. Great job and well done! >>>>> >>>>> best, >>>>> /Shahab >>>>> >>>>> On Mon, Sep 7, 2015 at 6:32 PM, Phil Wills <otherp...@gmail.com> >>>>> wrote: >>>>> >>>>>> Anders script is a bit out of date if you're using the latest version >>>>>> of EMR. Here's my fork: >>>>>> >>>>>> https://gist.github.com/philwills/71539f833f57338236b5 >>>>>> >>>>>> which worked OK for me fairly recently. >>>>>> >>>>>> Phil >>>>>> >>>>>> On Mon, 7 Sep 2015 at 10:01 shahab <shahab.mok...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am trying to use Zeppelin to work with Spark on Amazon EMR. I used >>>>>>> the script provided by Anders ( >>>>>>> https://gist.github.com/andershammar/224e1077021d0ea376dd) to setup >>>>>>> Zeppelin. The Zeppelin can connect to Spark but when I got error when I >>>>>>> run >>>>>>> the tutorials. and I get the following error: >>>>>>> >>>>>>> ...FileNotFoundException: File >>>>>>> file:/home/hadoop/zeppelin/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0-incubating-SNAPSHOT.jar >>>>>>> does not exist >>>>>>> >>>>>>> However, the above file does exists in that path on the Master node.' >>>>>>> >>>>>>> I do appreciate if anyone has any experience to share how to setup >>>>>>> Zeppelin with EMR . >>>>>>> >>>>>>> best, >>>>>>> /Shahab >>>>>>> >>>>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> Best regards, >>>> Eugene. >>>> >>> >>> >> >> >> -- >> >> >> Best regards, >> Eugene. >> > >