Re: Zeppelin + Spark On EMR?

Jonathan Kelly Wed, 30 Sep 2015 16:35:40 -0700

Zeppelin is now supported on EMR in release emr-4.1.0 without the need for
any bootstrap action like this. See
https://aws.amazon.com/blogs/aws/amazon-emr-release-4-1-0-spark-1-5-0-hue-3-7-1-hdfs-encryption-presto-oozie-zeppelin-improved-resizing/
for the emr-4.1.0 announcement.


BTW, the version of Zeppelin bundled with emr-4.1.0 is a SNAPSHOT version
of 0.6.0, built from commit a345f768471e9b8c89f4eb4d3aba6b684bff75b3.

~ Jonathan

On Wed, Sep 30, 2015 at 2:27 AM, Ophir Cohen <oph...@gmail.com> wrote:

> Did anyone else encountered that problem?
>
> I removed the *--driver-class-path "${CLASSPATH}"* from
> bin/interpreter.sh script and now it start the SparkContext as expected.
> The problem is that it does not grab my local hive-site.xml that pointed
> to an external metastore and try to use the local one :(
>
> On Fri, Sep 18, 2015 at 4:14 PM, Eugene <blackorange...@gmail.com> wrote:
>
>> Hi Anders,
>>
>> I also had the error you mention, overcame this with:
>>
>>    1. using spark installation from zeppelin
>>    2. altering conf/interpreter.json with properties like
>>    "spark.executor.instances", "spark.executor.cores",
>>    "spark.default.parallelism" from spark-defaults.conf, parsed this file
>>    using parts of your gist.
>>
>> Code looks like this:
>>
>> cd ~/zeppelin/conf/
>> SPARK_DEFAULTS=~/emr-spark-defaults.conf
>> SPARK_EXECUTOR_INSTANCES=$(grep spark.executor.instances $SPARK_DEFAULTS
>> | awk '{print $2}')
>> SPARK_EXECUTOR_CORES=$(grep spark.executor.cores $SPARK_DEFAULTS | awk
>> '{print $2}')
>> SPARK_EXECUTOR_MEMORY=$(grep spark.executor.memory $SPARK_DEFAULTS | awk
>> '{print $2}')
>> SPARK_DEFAULT_PARALLELISM=$(grep spark.default.parallelism
>> $SPARK_DEFAULTS | awk '{print $2}')
>> cat interpreter.json | jq
>> ".interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.instances\"
>> = \"${SPARK_EXECUTOR_INSTANCES}\" |
>> .interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.cores\" =
>> \"${SPARK_EXECUTOR_CORES}\" |
>> .interpreterSettings.\"2B188AQ5T\".properties.\"spark.executor.memory\" =
>> \"${SPARK_EXECUTOR_MEMORY}\" |
>> .interpreterSettings.\"2B188AQ5T\".properties.\"spark.default.parallelism\"
>> = \"${SPARK_DEFAULT_PARALLELISM}\" " > interpreter.json_
>> cat interpreter.json_ > interpreter.json
>> rm interpreter.json_
>>
>>
>> 2015-09-18 17:05 GMT+04:00 Anders Hammar <anders.ham...@gmail.com>:
>>
>>> Hi,
>>>
>>> Thank you Phil for updating my script to support the latest version of
>>> EMR.
>>> I have edited my gist so that it includes some of your updates plus
>>> added some other additional changes.
>>>
>>> https://gist.github.com/andershammar/224e1077021d0ea376dd
>>>
>>> While on the subject, has anyone be able to get Zeppelin to work
>>> together with the Amazon's Spark installation on Amazon EMR 4.x (by
>>> exporting SPARK_HOME and HADOOP_HOME instead)? When I try this then I get
>>> the following exception:
>>>
>>> org.apache.spark.SparkException: Found both spark.driver.extraClassPath
>>> and SPARK_CLASSPATH. Use only the former.
>>>     at
>>> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:444)
>>>     at
>>> org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$8.apply(SparkConf.scala:442)
>>>     at scala.collection.immutable.List.foreach(List.scala:318)
>>>     at
>>> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:442)
>>>     at
>>> org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:430)
>>>     at scala.Option.foreach(Option.scala:236)
>>>     at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:430)
>>>     ...
>>>
>>> From a quick look at it, the problem seems to be that the Amazon
>>> installation of Spark use SPARK_CLASSPATH to add additional libraries
>>> (/etc/spark/conf/spark-env.sh) while the Zeppelin use "spark-submit
>>> --driver-class-path" (zeppelin/bin/interpreter.sh).
>>>
>>> Any ideas?
>>>
>>> Best regards,
>>> Anders
>>>
>>>
>>> On Wed, Sep 9, 2015 at 5:09 PM, Eugene <blackorange...@gmail.com> wrote:
>>>
>>>> Here's a bit shorter alternative, too
>>>>
>>>> https://gist.github.com/snowindy/008f3e8b878a23c00679
>>>>
>>>> 2015-09-09 18:58 GMT+04:00 shahab <shahab.mok...@gmail.com>:
>>>>
>>>>> Thanks Phil, it works. Great job and well done!
>>>>>
>>>>> best,
>>>>> /Shahab
>>>>>
>>>>> On Mon, Sep 7, 2015 at 6:32 PM, Phil Wills <otherp...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Anders script is a bit out of date if you're using the latest version
>>>>>> of EMR.  Here's my fork:
>>>>>>
>>>>>> https://gist.github.com/philwills/71539f833f57338236b5
>>>>>>
>>>>>> which worked OK for me fairly recently.
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>> On Mon, 7 Sep 2015 at 10:01 shahab <shahab.mok...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am trying to use Zeppelin to work with Spark on Amazon EMR. I used
>>>>>>> the script provided by Anders (
>>>>>>> https://gist.github.com/andershammar/224e1077021d0ea376dd) to setup
>>>>>>> Zeppelin. The Zeppelin can connect to Spark but when I got error when I 
>>>>>>> run
>>>>>>> the tutorials. and I get the following error:
>>>>>>>
>>>>>>> ...FileNotFoundException: File
>>>>>>> file:/home/hadoop/zeppelin/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0-incubating-SNAPSHOT.jar
>>>>>>> does not exist
>>>>>>>
>>>>>>> However, the above file does exists in that path on the Master node.'
>>>>>>>
>>>>>>> I do appreciate if anyone has any experience to share how to setup
>>>>>>> Zeppelin with EMR .
>>>>>>>
>>>>>>> best,
>>>>>>> /Shahab
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Best regards,
>>>> Eugene.
>>>>
>>>
>>>
>>
>>
>> --
>>
>>
>> Best regards,
>> Eugene.
>>
>
>

Re: Zeppelin + Spark On EMR?

Reply via email to