Thanks, Jeff! I'll look into this solution.
On Wed, May 3, 2017 at 5:32 PM, Jeff Zhang <zjf...@gmail.com> wrote: > > Regarding interpreter memory issue, this is because zeppelin's spark > interpreter only support yarn-client mode, that means the driver runs in > the same host as zeppelin server. So it is pretty easy to run out of memory > if many users share the same driver (scoped mode you use). You can try livy > interpreter which support yarn-cluster, so that the driver run in the > remote host and each user use isolated spark app. > https://zeppelin.apache.org/docs/0.8.0-SNAPSHOT/interpreter/livy.html > > > Shanmukha Sreenivas Potti <shanmu...@utexas.edu>于2017年5月4日周四 上午6:54写道: > >> Hello Zeppelin users, >> >> >> >> I’m reaching out to you for some guidance on best practices. We currently >> use Zeppelin 0.7.0 on EMR and I have a few questions on gaining >> efficiencies with this setup that I would like to get addressed. Would >> really appreciate if any of you can help me with these issues or point me >> to the right person/team. >> >> >> >> *1. **Interpreter Settings* >> >> >> >> I understand that the newer versions (we are currently on Zeppelin 0.7), >> have the option of different interpreter nodes such as Scoped, Isolated and >> Shared. >> >> Multiple users in our team use the Zeppelin application by creating >> separate notebooks. Sometimes, jobs continue to execute endlessly or fail >> to execute or time out due to maxing out on memory. We tend to restart the >> interpreter or are sometimes forced to restart Zeppelin application on the >> EMR master node to resume operations. Is this the best way to deal with >> such issues? >> >> We currently use the ‘Scoped’ interpreter setting, i.e. it sets up an >> interpreter instance per note. >> >> Would you recommend that we continue to use this interpreter setting or >> do you think we would be served better by using any other available >> interpreter settings? I did take a look at the Zeppelin documentation for >> information on these settings but anything additional would be greatly >> helpful. >> >> >> >> Also, is there a way to accurately determine how much of the available >> memory is being used by the various jobs on Zeppelin? The ‘Job’ tab gives >> us insights on what jobs in various notebooks are running but we don’t have >> insight on the memory/compute power being used. >> >> >> >> Ideally, I would like to figure out the root cause behind why my queries >> are not running. Is it because of memory maxing out on Zeppelin or HDFS or >> Spark or because of insufficiency in the number of compute nodes. >> >> >> >> Would really appreciate if you could share any documentation that can >> guide me on these aspects. >> >> >> >> *2. **Installation Ports* >> >> By default Zeppelin on EMR gets installed on port 8890. However, to be >> complaint with security policies we needed to use other ports. This change >> was made by editing the Zeppelin configuration file in SSH. I’m concerned >> if this approach has cloned the application on the other ports and also >> restricting my usage of Zeppelin. Is this the right way of installing >> Zeppelin on another port? >> >> >> >> Appreciate any pointers you may have. Please see below for more >> information on the cluster and the applications on the cluster. >> >> >> >> *Thanks,* >> >> *Shan* >> >> >> >> *Cluster Details:* >> >> Release label: emr-5.4.0 >> >> Applications: Hive 2.1.1, Pig 0.16.0, Hue 3.11.0, Spark 2.1.0, HBase >> 1.3.0, Zeppelin 0.7.0, Oozie 4.3.0, Mahout 0.12.2 >> > -- Shan S. Potti, 737-333-1952 https://www.linkedin.com/in/shanmukhasreenivas