Re: increasing Spark driver memory using Zeppelin with Spark/YARN

2016-02-03 Thread Hyung Sung Shim
Hello.

I commented below your questions.

1. is the Spark driver executing in the same JVM as the Zeppelin
RemoteInterpreterServer?
*-> I know that is sperated. The interpreter server process is executed by
~/bin/interpreter.sh*

2. how does one correctly size the Spark driver memory (Xmx) in this
setting?
*->  you can refer to
http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Can-not-configure-driver-memory-size-td1513.html
*

3. as an alternative: is yarn-master supported with Zeppelin?
*-> **I know that **zeppelin support only yarn-client mode.*

If I'm wrong, please fix.
Thanks.

2016-02-03 20:31 GMT+09:00 Gerald Loeffler :

> dear Zeppelin users,
>
> we’ve been using Zeppelin (0.5.5) with Spark/YARN successfully for quite
> some time but now
> we need a Spark driver with lots of memory and fail to achieve that using
> Zeppelin (it works without issues using direct spark-submit). Currently
> we’re using yarn-client mode.
>
>1. is the Spark driver executing in the same JVM as the Zeppelin
>RemoteInterpreterServer?
>2. how does one correctly size the Spark driver memory (Xmx) in this
>setting?
>3.
>
>as an alternative: is yarn-master supported with Zeppelin?
>
>
>thank you very much in advance for your help!,
>gerald
>
>
> --
> Gerald Loeffler
> mailto:gerald.loeff...@googlemail.com
> http://www.gerald-loeffler.net
>


Re: zeppelin multi user mode?

2016-02-03 Thread Hyung Sung Shim
Hello yunfeng.

You can also refer to
https://github.com/NFLabs/z-manager/tree/master/multitenancy.

Thanks.

2016-02-04 3:56 GMT+09:00 Christopher Matta :

> I have had luck with a single Zepplin installation and  config directories
> in each user home directory. That way each user gets their own instance and
> will not interfere with each other.
>
> You can start the Zepplin server with a config flag pointing to the config
> directory. Simply copy the config dir that comes with Zepplin to
> ~/.zeppelin and edit the zeppelin-site.xml to change default port for each
> user. Start like this:
> ./zeppelin.sh --config ~/.zeppelin start
>
>
> On Wednesday, February 3, 2016, Lin, Yunfeng  wrote:
>
>> Hi guys,
>>
>>
>>
>> We are planning to use zeppelin for PROD for data scientists. One feature
>> we desperately need is multi user mode.
>>
>>
>>
>> Currently, zeppelin is great for single user use. However, since zeppelin
>> spark context are shared among all users in one zeppelin server, it is not
>> very suitable when there are multiple users on the same zeppelin server
>> since they are going to interfere with each other in one spark context.
>>
>>
>>
>> How do you guys address this need? Thanks.
>>
>>
>>
>
>
> --
> Chris Matta
> cma...@mapr.com
> 215-701-3146
>
>


Re: zeppelin multi user mode?

2016-02-03 Thread Benjamin Kim
I forgot to mention that I don’t see Spark 1.6 in the list of versions when 
installing z-manager.

> On Feb 3, 2016, at 10:08 PM, Corneau Damien  wrote:
> 
> @Benjamin,
> We do support version 1.6 of Spark, see: 
> https://github.com/apache/incubator-zeppelin#spark-interpreter 
> 
> 
> On Wed, Feb 3, 2016 at 9:47 PM, Benjamin Kim  > wrote:
> I see that the latest version of Spark supported is 1.4.1. When will the 
> latest versions of Spark be supported?
> 
> Thanks,
> Ben
> 
> 
>> On Feb 3, 2016, at 7:54 PM, Hyung Sung Shim > > wrote:
>> 
>> Hello yunfeng.
>> 
>> You can also refer to 
>> https://github.com/NFLabs/z-manager/tree/master/multitenancy 
>> .
>> 
>> Thanks. 
>> 
>> 2016-02-04 3:56 GMT+09:00 Christopher Matta > >:
>> I have had luck with a single Zepplin installation and  config directories 
>> in each user home directory. That way each user gets their own instance and 
>> will not interfere with each other. 
>> 
>> You can start the Zepplin server with a config flag pointing to the config 
>> directory. Simply copy the config dir that comes with Zepplin to ~/.zeppelin 
>> and edit the zeppelin-site.xml to change default port for each user. Start 
>> like this: 
>> ./zeppelin.sh --config ~/.zeppelin start
>> 
>> 
>> On Wednesday, February 3, 2016, Lin, Yunfeng > > wrote:
>> Hi guys,
>> 
>>  
>> 
>> We are planning to use zeppelin for PROD for data scientists. One feature we 
>> desperately need is multi user mode.
>> 
>>  
>> 
>> Currently, zeppelin is great for single user use. However, since zeppelin 
>> spark context are shared among all users in one zeppelin server, it is not 
>> very suitable when there are multiple users on the same zeppelin server 
>> since they are going to interfere with each other in one spark context.
>> 
>>  
>> 
>> How do you guys address this need? Thanks.
>> 
>>  
>> 
>> 
>> 
>> -- 
>> Chris Matta
>> cma...@mapr.com 
>> 215-701-3146 
>> 
> 
> 



Re: HBase Interpreter

2016-02-03 Thread Rajat Venkatesh
Can you check the version of HBase ? HBase interpreter has been tested with
HBase 1.0.x and Hadoop 2.6.0. There is a good chance this error is due to
mismatch in versions.

On Thu, Feb 4, 2016 at 10:20 AM Benjamin Kim  wrote:

> I got this error below trying out the new HBase Interpreter after pulling
> and compiling the latest.
>
> org.jruby.exceptions.RaiseException: (NameError) cannot load Java class
> org.apache.hadoop.hbase.quotas.ThrottleType
> at
> org.jruby.javasupport.JavaUtilities.get_proxy_or_package_under_package(org/jruby/javasupport/JavaUtilities.java:54)
> at (Anonymous).method_missing(/builtin/javasupport/java.rb:51)
> at
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:23)
> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
> at
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:24)
> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
> at
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/hbase.rb:90)
> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
> at
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase.rb:118)
>
> Is there something I’m missing. Is it because I’m using CDH 5.4.8?
>
> Thanks,
> Ben


Re: HBase Interpreter

2016-02-03 Thread Rajat Venkatesh
Oh. That should work. I've tested with 1.0.0. Hmm

On Thu, Feb 4, 2016 at 10:50 AM Benjamin Kim  wrote:

> Hi Rajat,
>
> The version of HBase that comes with CDH 5.4.8 is 1.0.0. How do I check if
> they are compatible?
>
> Thanks,
> Ben
>
>
> On Feb 3, 2016, at 9:16 PM, Rajat Venkatesh  wrote:
>
> Can you check the version of HBase ? HBase interpreter has been tested
> with HBase 1.0.x and Hadoop 2.6.0. There is a good chance this error is due
> to mismatch in versions.
>
> On Thu, Feb 4, 2016 at 10:20 AM Benjamin Kim  wrote:
>
>> I got this error below trying out the new HBase Interpreter after pulling
>> and compiling the latest.
>>
>> org.jruby.exceptions.RaiseException: (NameError) cannot load Java class
>> org.apache.hadoop.hbase.quotas.ThrottleType
>> at
>> org.jruby.javasupport.JavaUtilities.get_proxy_or_package_under_package(org/jruby/javasupport/JavaUtilities.java:54)
>> at (Anonymous).method_missing(/builtin/javasupport/java.rb:51)
>> at
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:23)
>> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
>> at
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:24)
>> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
>> at
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/hbase.rb:90)
>> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
>> at
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase.rb:118)
>>
>> Is there something I’m missing. Is it because I’m using CDH 5.4.8?
>>
>> Thanks,
>> Ben
>
>
>


Re: Upgrade spark to 1.6.0

2016-02-03 Thread Felix Cheung
I think his build command only works with Cloudera CDH 5.4.8, as you can see. 
Mismatch Akka version is very common if the Hadoop distribution is different. 
What version of Spark and Hadoop distribution are you running with?






On Tue, Feb 2, 2016 at 1:36 PM -0800, "Daniel Valdivia" 
 wrote:





Hello,

An update on the matter, using compile string

mvn clean package -Pspark-1.6 -Dspark.version=1.6.0 
-Dhadoop.version=2.6.0-cdh5.4.8 -Phadoop-2.6 -Pyarn -Ppyspark -Pvendor-repo 
-DskipTests

I end up getting the following error stack trace upon executing a new JSON

akka.ConfigurationException: Akka JAR version [2.2.3] does not match the 
provided config version [2.3.11] at 
akka.actor.ActorSystem$Settings.(ActorSystem.scala:181) at 
akka.actor.ActorSystemImpl.(ActorSystem.scala:470) at 
akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at 
akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at 
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
 at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at 
org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52) at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1964)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at 
org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1955) at 
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) at 
org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) at 
org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193) at 
org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288) at 
org.apache.spark.SparkContext.(SparkContext.scala:457) at 
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
 at 
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
 at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465) 
at 
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
 at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
 at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
 at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
 at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at 
java.util.concurrent.FutureTask.run(FutureTask.java:262) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745)

There's some mentioning of this problem on SO, but seems like it was fixed

http://stackoverflow.com/questions/32294276/how-to-connect-zeppelin-to-spark-1-5-built-from-the-sources
 


any idea on how to deal with this AKKA library problem?

> On Feb 2, 2016, at 12:02 PM, Daniel Valdivia  wrote:
>
> Hi,
>
> Thanks for the suggestion, I'm running maven with Ben's command
>
> Cheers!
>
>> On Feb 1, 2016, at 7:47 PM, Benjamin Kim > > wrote:
>>
>> Hi Felix,
>>
>> After installing Spark 1.6, I built Zeppelin using:
>>
>> mvn clean package -Pspark-1.6 -Dspark.version=1.6.0 
>> -Dhadoop.version=2.6.0-cdh5.4.8 -Phadoop-2.6 -Pyarn -Ppyspark -Pvendor-repo 
>> -DskipTests
>>
>> This worked for me.
>>
>> Cheers,
>> Ben
>>
>>
>>> On Feb 1, 2016, at 7:44 PM, Felix Cheung >> > wrote:
>>>
>>> Hi
>>>
>>> You can see the build command line example here for spark 1.6 profile
>>>
>>> https://github.com/apache/incubator-zeppelin/blob/master/README.md 
>>> 
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 1, 2016 at 3:59 PM -0800, "Daniel Valdivia" 
>>> > wrote:
>>>
>>> Hi,
>>>
>>> I'd like to ask if there's an easy way to upgrade spark to 1.6.0 from the 
>>> current 1.4.x that's bundled with the current release of zepellin, would 
>>> updating the pom.xml and compiling suffice ?
>>>
>>> Cheers
>>
>



Re: Upgrade spark to 1.6.0

2016-02-03 Thread Daniel Valdivia
I don't need any specific version of Hadoop, I actually removed it from the 
build command and still get the error, I just need spark 1.6


> On Feb 3, 2016, at 9:05 AM, Felix Cheung  wrote:
> 
> I think his build command only works with Cloudera CDH 5.4.8, as you can see. 
> Mismatch Akka version is very common if the Hadoop distribution is different. 
> What version of Spark and Hadoop distribution are you running with?
> 
> 
> 
> 
> 
> On Tue, Feb 2, 2016 at 1:36 PM -0800, "Daniel Valdivia" 
> > wrote:
> 
> Hello,
> 
> An update on the matter, using compile string
> 
> mvn clean package -Pspark-1.6 -Dspark.version=1.6.0 
> -Dhadoop.version=2.6.0-cdh5.4.8 -Phadoop-2.6 -Pyarn -Ppyspark -Pvendor-repo 
> -DskipTests
> 
> I end up getting the following error stack trace upon executing a new JSON
> 
> akka.ConfigurationException: Akka JAR version [2.2.3] does not match the 
> provided config version [2.3.11] at 
> akka.actor.ActorSystem$Settings.(ActorSystem.scala:181) at 
> akka.actor.ActorSystemImpl.(ActorSystem.scala:470) at 
> akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at 
> akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at 
> org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
>  at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at 
> org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52) at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1964)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at 
> org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1955) at 
> org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) at 
> org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) at 
> org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193) at 
> org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288) at 
> org.apache.spark.SparkContext.(SparkContext.scala:457) at 
> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
>  at 
> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
>  at 
> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)  
> at 
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
>  at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
>  at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
>  at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
>  at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at 
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:262) at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:745)
> 
> There's some mentioning of this problem on SO, but seems like it was fixed
> 
> http://stackoverflow.com/questions/32294276/how-to-connect-zeppelin-to-spark-1-5-built-from-the-sources
>  
> 
> 
> any idea on how to deal with this AKKA library problem?
> 
>> On Feb 2, 2016, at 12:02 PM, Daniel Valdivia > > wrote:
>> 
>> Hi,
>> 
>> Thanks for the suggestion, I'm running maven with Ben's command
>> 
>> Cheers!
>> 
>>> On Feb 1, 2016, at 7:47 PM, Benjamin Kim >> > wrote:
>>> 
>>> Hi Felix,
>>> 
>>> After installing Spark 1.6, I built Zeppelin using:
>>> 
>>> mvn clean package -Pspark-1.6 -Dspark.version=1.6.0 
>>> -Dhadoop.version=2.6.0-cdh5.4.8 -Phadoop-2.6 -Pyarn -Ppyspark -Pvendor-repo 
>>> -DskipTests
>>> 
>>> This worked for me.
>>> 
>>> Cheers,
>>> Ben
>>> 
>>> 
 On Feb 1, 2016, at 7:44 PM, Felix Cheung > wrote:
 
 Hi
 
 You can see the build command line example here for spark 1.6 profile
 
 https://github.com/apache/incubator-zeppelin/blob/master/README.md 
 
 
 
 
 
 
 On Mon, Feb 1, 2016 at 3:59 PM -0800, "Daniel Valdivia" 
 

Re: zeppelin multi user mode?

2016-02-03 Thread Christopher Matta
I have had luck with a single Zepplin installation and  config directories
in each user home directory. That way each user gets their own instance and
will not interfere with each other.

You can start the Zepplin server with a config flag pointing to the config
directory. Simply copy the config dir that comes with Zepplin to
~/.zeppelin and edit the zeppelin-site.xml to change default port for each
user. Start like this:
./zeppelin.sh --config ~/.zeppelin start


On Wednesday, February 3, 2016, Lin, Yunfeng  wrote:

> Hi guys,
>
>
>
> We are planning to use zeppelin for PROD for data scientists. One feature
> we desperately need is multi user mode.
>
>
>
> Currently, zeppelin is great for single user use. However, since zeppelin
> spark context are shared among all users in one zeppelin server, it is not
> very suitable when there are multiple users on the same zeppelin server
> since they are going to interfere with each other in one spark context.
>
>
>
> How do you guys address this need? Thanks.
>
>
>


-- 
Chris Matta
cma...@mapr.com
215-701-3146