Re: Interpreter behavior

2018-11-08 Thread Jeff Zhang
There's no standard way to calculate the memory requirement for driver. It
depends on your app, e.g if you want to fetch large data into driver, then
you'd better to set a large value for driver memory.

Regarding running paragraphs simultaneously, for scala/python/r code, the
execution in one paragraph is sequential, the reason is that there may be
dependencies between paragraphs, e.g. paragraph 2 may use the variable
defined in paragraph 1



Ajay Viswanathan 于2018年11月9日周五
下午3:49写道:

> I use version 0.7.3. I have been trying to investigate the reasons for the
> timeouts. I was trying to tune the number of cores available. Now that
> you've mentioned the driver memory issue, I'll try it out again and let you
> know if that solves the problem. What would be a back-of-the-envelope
> calculation for the resource requirements per notebook in a scoped + per
> note configuration?
>
> Also, is there a provision for executing multiple spark paragraphs in a
> notebook simultaneously? I wasn't able to achieve that and hence the
> workaround by running each paragraph in its own notebook.
>
> Thanks
> Ajay Viswanathan
> Sr. Software Engineer, MiQ
>
> On Fri, 9 Nov 2018 at 13:12, Jeff Zhang  wrote:
>
> > HI Ajay,
> >
> > Thanks for the reporting, which version do you use ? One know issue of
> > spark scoped mode is that each spark repl will occupy large memory and
> > won't be released, one workaround is to increase the driver memory.
> >
> > https://jira.apache.org/jira/browse/ZEPPELIN-3389
> >
> >
> > Ajay Viswanathan 于2018年11月9日周五
> > 下午3:31写道:
> >
> > > This is an issue even I am facing in my project currently. By running
> the
> > > spark interpreter in Scoped + Per Note mode, I do manage to execute
> > > paragraphs in parallel, but it becomes very resource intensive and
> times
> > > out if I run more than 3-4 jobs in parallel on a 4-core cloud instance.
> > > Typically a Thrift Transport exception is thrown after a prolonged
> period
> > > of inactivity,
> > >
> > > On Fri, 9 Nov 2018 at 06:54, Jeff Zhang  wrote:
> > >
> > > > Which version do you use ? This seems a bug. Each note should have
> its
> > > own
> > > > scheduler in scoped per note mode.
> > > >
> > > > 于2018年11月9日周五 上午1:56写道:
> > > >
> > > > > Hi
> > > > >
> > > > > We use zeppelin in multi-user environment, the interpreter scope
> mode
> > > > > seems to allow notebook execution in serial only. If multiple users
> > are
> > > > > running their notebooks concurrently, these notebooks are queued
> for
> > > > serial
> > > > > execution. If one notebook takes a long time to complete, it
> > basically
> > > > > blocks other notebooks from execution. To enable parallel notebook
> > > > > execution, it seems we need to use the isolated mode, which
> creates a
> > > new
> > > > > interpreter instance (run on separate JVM) per user. But this can
> > > become
> > > > > expensive (compute resource intensive). what is the suggested
> > > interpreter
> > > > > mode for multi-user environment?
> > > > >
> > > > > Thanks
> > > > > Denny
> > > > >
> > > >
> > >
> > >
> > > --
> > > Ajay Viswanathan
> > > *Sr. Software Engineer, CAPS - Processing*
> > > *A: *5th & 6th Floor | Skav 909 | 9/1 | Lavelle Road | Bangalore |
> 560001
> > > *E: *yourem...@miqdigital.com
> > > *M: *+00 (0)0  
> > > *W: *wearemiq.com
> > > [image: MiQ] 
> > > *Disclaimer: *This email and its attachments are confidential and are
> > > intended solely for the use of the individual to whom it is addressed.
> If
> > > you are not the intended recipient of this email and its attachments,
> you
> > > must take no action based upon them, nor must you copy or show them to
> > > anyone. No contracts or official orders shall
> 
> be concluded by means of
> > this
> > > email. Please contact the sender if you believe you have received this
> > > email in error.
> > >
> >
>
>
> --
> Ajay Viswanathan
> *Sr. Software Engineer, CAPS - Processing*
> *A: *5th & 6th Floor | Skav 909 | 9/1 | Lavelle Road | Bangalore | 560001
> *E: *ajayviswanat...@miqdigital.com
> *M: *+00 (0)0  
> *W: *wearemiq.com
> [image: MiQ] 
> *Disclaimer: *This email and its attachments are confidential and are
> intended solely for the use of the individual to whom it is addressed. If
> you are not the intended recipient of this email and its attachments, you
> must take no action based upon them, nor must you copy or show them to
> anyone. No contracts or official orders shall be concluded by means of this
> email. Please contact the sender if you believe you have received this
> email in error.
>


Re: Interpreter behavior

2018-11-08 Thread Ajay Viswanathan
I use version 0.7.3. I have been trying to investigate the reasons for the
timeouts. I was trying to tune the number of cores available. Now that
you've mentioned the driver memory issue, I'll try it out again and let you
know if that solves the problem. What would be a back-of-the-envelope
calculation for the resource requirements per notebook in a scoped + per
note configuration?

Also, is there a provision for executing multiple spark paragraphs in a
notebook simultaneously? I wasn't able to achieve that and hence the
workaround by running each paragraph in its own notebook.

Thanks
Ajay Viswanathan
Sr. Software Engineer, MiQ

On Fri, 9 Nov 2018 at 13:12, Jeff Zhang  wrote:

> HI Ajay,
>
> Thanks for the reporting, which version do you use ? One know issue of
> spark scoped mode is that each spark repl will occupy large memory and
> won't be released, one workaround is to increase the driver memory.
>
> https://jira.apache.org/jira/browse/ZEPPELIN-3389
>
>
> Ajay Viswanathan 于2018年11月9日周五
> 下午3:31写道:
>
> > This is an issue even I am facing in my project currently. By running the
> > spark interpreter in Scoped + Per Note mode, I do manage to execute
> > paragraphs in parallel, but it becomes very resource intensive and times
> > out if I run more than 3-4 jobs in parallel on a 4-core cloud instance.
> > Typically a Thrift Transport exception is thrown after a prolonged period
> > of inactivity,
> >
> > On Fri, 9 Nov 2018 at 06:54, Jeff Zhang  wrote:
> >
> > > Which version do you use ? This seems a bug. Each note should have its
> > own
> > > scheduler in scoped per note mode.
> > >
> > > 于2018年11月9日周五 上午1:56写道:
> > >
> > > > Hi
> > > >
> > > > We use zeppelin in multi-user environment, the interpreter scope mode
> > > > seems to allow notebook execution in serial only. If multiple users
> are
> > > > running their notebooks concurrently, these notebooks are queued for
> > > serial
> > > > execution. If one notebook takes a long time to complete, it
> basically
> > > > blocks other notebooks from execution. To enable parallel notebook
> > > > execution, it seems we need to use the isolated mode, which creates a
> > new
> > > > interpreter instance (run on separate JVM) per user. But this can
> > become
> > > > expensive (compute resource intensive). what is the suggested
> > interpreter
> > > > mode for multi-user environment?
> > > >
> > > > Thanks
> > > > Denny
> > > >
> > >
> >
> >
> > --
> > Ajay Viswanathan
> > *Sr. Software Engineer, CAPS - Processing*
> > *A: *5th & 6th Floor | Skav 909 | 9/1 | Lavelle Road | Bangalore | 560001
> > *E: *yourem...@miqdigital.com
> > *M: *+00 (0)0  
> > *W: *wearemiq.com
> > [image: MiQ] 
> > *Disclaimer: *This email and its attachments are confidential and are
> > intended solely for the use of the individual to whom it is addressed. If
> > you are not the intended recipient of this email and its attachments, you
> > must take no action based upon them, nor must you copy or show them to
> > anyone. No contracts or official orders shall be concluded by means of
> this
> > email. Please contact the sender if you believe you have received this
> > email in error.
> >
>


-- 
Ajay Viswanathan
*Sr. Software Engineer, CAPS - Processing*
*A: *5th & 6th Floor | Skav 909 | 9/1 | Lavelle Road | Bangalore | 560001
*E: *ajayviswanat...@miqdigital.com
*M: *+00 (0)0  
*W: *wearemiq.com
[image: MiQ] 
*Disclaimer: *This email and its attachments are confidential and are
intended solely for the use of the individual to whom it is addressed. If
you are not the intended recipient of this email and its attachments, you
must take no action based upon them, nor must you copy or show them to
anyone. No contracts or official orders shall be concluded by means of this
email. Please contact the sender if you believe you have received this
email in error.


Re: Interpreter behavior

2018-11-08 Thread Jeff Zhang
HI Ajay,

Thanks for the reporting, which version do you use ? One know issue of
spark scoped mode is that each spark repl will occupy large memory and
won't be released, one workaround is to increase the driver memory.

https://jira.apache.org/jira/browse/ZEPPELIN-3389


Ajay Viswanathan 于2018年11月9日周五
下午3:31写道:

> This is an issue even I am facing in my project currently. By running the
> spark interpreter in Scoped + Per Note mode, I do manage to execute
> paragraphs in parallel, but it becomes very resource intensive and times
> out if I run more than 3-4 jobs in parallel on a 4-core cloud instance.
> Typically a Thrift Transport exception is thrown after a prolonged period
> of inactivity,
>
> On Fri, 9 Nov 2018 at 06:54, Jeff Zhang  wrote:
>
> > Which version do you use ? This seems a bug. Each note should have its
> own
> > scheduler in scoped per note mode.
> >
> > 于2018年11月9日周五 上午1:56写道:
> >
> > > Hi
> > >
> > > We use zeppelin in multi-user environment, the interpreter scope mode
> > > seems to allow notebook execution in serial only. If multiple users are
> > > running their notebooks concurrently, these notebooks are queued for
> > serial
> > > execution. If one notebook takes a long time to complete, it basically
> > > blocks other notebooks from execution. To enable parallel notebook
> > > execution, it seems we need to use the isolated mode, which creates a
> new
> > > interpreter instance (run on separate JVM) per user. But this can
> become
> > > expensive (compute resource intensive). what is the suggested
> interpreter
> > > mode for multi-user environment?
> > >
> > > Thanks
> > > Denny
> > >
> >
>
>
> --
> Ajay Viswanathan
> *Sr. Software Engineer, CAPS - Processing*
> *A: *5th & 6th Floor | Skav 909 | 9/1 | Lavelle Road | Bangalore | 560001
> *E: *yourem...@miqdigital.com
> *M: *+00 (0)0  
> *W: *wearemiq.com
> [image: MiQ] 
> *Disclaimer: *This email and its attachments are confidential and are
> intended solely for the use of the individual to whom it is addressed. If
> you are not the intended recipient of this email and its attachments, you
> must take no action based upon them, nor must you copy or show them to
> anyone. No contracts or official orders shall be concluded by means of this
> email. Please contact the sender if you believe you have received this
> email in error.
>


Re: Interpreter behavior

2018-11-08 Thread Ajay Viswanathan
This is an issue even I am facing in my project currently. By running the
spark interpreter in Scoped + Per Note mode, I do manage to execute
paragraphs in parallel, but it becomes very resource intensive and times
out if I run more than 3-4 jobs in parallel on a 4-core cloud instance.
Typically a Thrift Transport exception is thrown after a prolonged period
of inactivity,

On Fri, 9 Nov 2018 at 06:54, Jeff Zhang  wrote:

> Which version do you use ? This seems a bug. Each note should have its own
> scheduler in scoped per note mode.
>
> 于2018年11月9日周五 上午1:56写道:
>
> > Hi
> >
> > We use zeppelin in multi-user environment, the interpreter scope mode
> > seems to allow notebook execution in serial only. If multiple users are
> > running their notebooks concurrently, these notebooks are queued for
> serial
> > execution. If one notebook takes a long time to complete, it basically
> > blocks other notebooks from execution. To enable parallel notebook
> > execution, it seems we need to use the isolated mode, which creates a new
> > interpreter instance (run on separate JVM) per user. But this can become
> > expensive (compute resource intensive). what is the suggested interpreter
> > mode for multi-user environment?
> >
> > Thanks
> > Denny
> >
>


-- 
Ajay Viswanathan
*Sr. Software Engineer, CAPS - Processing*
*A: *5th & 6th Floor | Skav 909 | 9/1 | Lavelle Road | Bangalore | 560001
*E: *yourem...@miqdigital.com
*M: *+00 (0)0  
*W: *wearemiq.com
[image: MiQ] 
*Disclaimer: *This email and its attachments are confidential and are
intended solely for the use of the individual to whom it is addressed. If
you are not the intended recipient of this email and its attachments, you
must take no action based upon them, nor must you copy or show them to
anyone. No contracts or official orders shall be concluded by means of this
email. Please contact the sender if you believe you have received this
email in error.


[jira] [Created] (ZEPPELIN-3857) Spark 2.4.0 and Scala 2.12

2018-11-08 Thread antonkulaga (JIRA)
antonkulaga created ZEPPELIN-3857:
-

 Summary: Spark 2.4.0 and Scala 2.12
 Key: ZEPPELIN-3857
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3857
 Project: Zeppelin
  Issue Type: Improvement
Reporter: antonkulaga


Spark 2.4.0 was released. Together with many new features and bug fixes it also 
brings long anticipated Scala 2.12 support. Many great Scala libraries already 
dropped Scala 2.11 support and people were suffering because of the inability 
to use them together with spark inside zeppelin, with Spark 2.4.0 support if 
will finally be possible to solve this problem



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zeppelin issue #3113: [ZEPPELIN-3616] fix editor sections auto-collapse

2018-11-08 Thread prabhjyotsingh
Github user prabhjyotsingh commented on the issue:

https://github.com/apache/zeppelin/pull/3113
  
Will merge this, if no more discussion.


---


[GitHub] zeppelin issue #3206: [ZEPPELIN-3810] Support Spark 2.4

2018-11-08 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
Thanks for everyone, the only remaining thing is to update `.travis.yml` to 
make sure the test pass again spark 2.4


---


Re: Interpreter behavior

2018-11-08 Thread Jeff Zhang
Which version do you use ? This seems a bug. Each note should have its own
scheduler in scoped per note mode.

于2018年11月9日周五 上午1:56写道:

> Hi
>
> We use zeppelin in multi-user environment, the interpreter scope mode
> seems to allow notebook execution in serial only. If multiple users are
> running their notebooks concurrently, these notebooks are queued for serial
> execution. If one notebook takes a long time to complete, it basically
> blocks other notebooks from execution. To enable parallel notebook
> execution, it seems we need to use the isolated mode, which creates a new
> interpreter instance (run on separate JVM) per user. But this can become
> expensive (compute resource intensive). what is the suggested interpreter
> mode for multi-user environment?
>
> Thanks
> Denny
>


[jira] [Created] (ZEPPELIN-3856) Zeppelin add Hadoop Submarine(machine learning) interpreter

2018-11-08 Thread Xun Liu (JIRA)
Xun Liu created ZEPPELIN-3856:
-

 Summary: Zeppelin add Hadoop Submarine(machine learning) 
interpreter
 Key: ZEPPELIN-3856
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3856
 Project: Zeppelin
  Issue Type: New Feature
Affects Versions: 0.9.0
Reporter: Xun Liu
Assignee: Xun Liu
 Attachments: zepl-submarine-interactive-dev.png

Hadoop Submarine is the latest machine learning framework subproject in the 
Hadoop 3.2 release. It allows Hadoop to support Tensorflow, MXNet,Caffe, Spark, 
etc. A variety of deep learning frameworks provide a full-featured system 
framework for machine learning algorithm development, distributed model 
training, model management, and model publishing, combined with hadoop's 
intrinsic data storage and data processing capabilities to enable data 
scientists to Good mining and the value of the data.
 
{color:#00}I was involved in the development of the hadoop submarine 
project. So I plan to add the interpreter module of hadoop submarine to 
zeppelin.
Let zeppeline increase the development of deep learning. This is my design 
document, let's see if there is any opinion, you can put it directly in the 
document, thank you!{color}
[https://docs.google.com/document/d/16YN8Kjmxt1Ym3clx5pDnGNXGajUT36hzQxjaik1cP4A/edit?ts=5bc6bfdd]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zeppelin issue #3113: [ZEPPELIN-3616] fix editor sections auto-collapse

2018-11-08 Thread Savalek
Github user Savalek commented on the issue:

https://github.com/apache/zeppelin/pull/3113
  
What about merge?


---


[GitHub] zeppelin issue #3215: [ZEPPELIN-3167]

2018-11-08 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/3215
  
The unit test in `SparkRInterpreterTest` did cover the error case. We need 
to add this kind of unit test. 


---


Re: Zeppelin add hadoop submarine(machine learning) interpreter

2018-11-08 Thread liuxun
hi, community

I updated the design documentation, 
https://docs.google.com/document/d/16YN8Kjmxt1Ym3clx5pDnGNXGajUT36hzQxjaik1cP4A/edit#
 

added the interactive development of machine learning algorithms and the 
overall design of submitting Note packages to YARN for job execution.

I have completed the development of the pre-research part of the code, and 
established JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-3856 


welcome comments, thank you!

> 在 2018年10月30日,下午10:11,liuxun  写道:
> 
> HI, community
> 
> 
> I updated the design document and submitted it to the hadoop community.
> https://docs.google.com/document/d/16YN8Kjmxt1Ym3clx5pDnGNXGajUT36hzQxjaik1cP4A/edit#
>  
> 
> 
> If you have any questions, you can ask them in the email or directly in the 
> document. Thank you!
> 
> I developed the submarine interpreter module in zeppelin. Below is the 
> interface for the system to run.
> 
> 1. zeppelin submarine interpreter properties
> 
> 1.1 System administrators can configure multiple interpreters, and different 
> interpreters can have different resource configurations.
> 1.2 Allow different users to use different interpreters for resource 
> allocation and management.
> 
> 2. zeppelin submarine Tensorflow interpreter
> 
> 2.1 Users use their own interpreter, First through the %submarine.tensorflow 
> paragraph, To write python code for tensorflow.
> 2.2 After the user writes the python code of tensorflow, Click the [RUN] 
> button and zeppelin will upload the python code to the specified HDFS 
> directory. The submarine is loaded into the docker container when it is 
> running.
> 
> 3. zeppelin submarine interpreter
> 
> 3.1 The user sets the call parameter value of the tensorflow python code, 
> Then enter the job run command.
> 3.2 The zeppelin submarine interpreter first checks that all parameters of 
> the submarine run are set completely. After the check is passed, the 
> implementation of YARN is performed via submarine.jar.
> 3.3 The progress and log information of the submarine will be displayed in 
> zeppelin's note.
> 3.4 You can enter submarine's other commands in the submarine interpreter, 
> view all the jobs in the submarine, etc.
> 
> 
>> 在 2018年10月30日,下午6:40,liuxun mailto:neliu...@163.com>> 写道:
>> 
>> 
>> HI,
>> 
>> 
>>  I updated the design document and submitted it to the hadoop community.
>>  
>> https://docs.google.com/document/d/16YN8Kjmxt1Ym3clx5pDnGNXGajUT36hzQxjaik1cP4A/edit#
>>  
>> 
>>  
>> >  
>> >
>>  
>>  If you have any questions, you can ask them in the email or directly in 
>> the document. Thank you!
>> 
>>  I developed the submarine interpreter module in zeppelin. Below is the 
>> interface for the system to run.
>> 
>> zeppelin submarine interpreter properties
>> 
>> System administrators can configure multiple interpreters, and different 
>> interpreters can have different resource configurations.
>> Allow different users to use different interpreters for resource allocation 
>> and management.
>> zeppelin submarine Tensorflow interpreter
>> 
>> Users use their own interpreter, First through the %submarine.tensorflow 
>> paragraph, To write python code for tensorflow.
>> After the user writes the python code of tensorflow, Click the [RUN] button 
>> and zeppelin will upload the python code to the specified HDFS directory. 
>> The submarine is loaded into the docker container when it is running.
>> 
>> zeppelin submarine interpreter
>> 
>> The user sets the call parameter value of the tensorflow python code, Then 
>> enter the job run command.
>> The zeppelin submarine interpreter first checks that all parameters of the 
>> submarine run are set completely. After the check is passed, the 
>> implementation of YARN is performed via submarine.jar.
>> The progress and log information of the submarine will be displayed in 
>> zeppelin's note.
>> You can enter submarine's other commands in the submarine interpreter, view 
>> all the jobs in the submarine, etc.
>> 
>> 
>> 
>>> 在 2018年10月21日,上午6:26,Felix Cheung >> > 写道:
>>> 
>>> Very cool!
>>> 
>>> 
>>> 
>>> From: Jeff Zhang mailto:zjf...@gmail.com>>
>>> Sent: Friday, October 19, 2018 7:14 AM
>>> To: dev@zeppelin.apache.org 
>>> Subject: Re: Zeppelin add hadoop submarine(machine learning framework) 
>>> interpreter
>>> 
>>> Thanks xun. This would be a great addon for zeppelin to support deep
>>> learning. I will check t

[GitHub] zeppelin issue #3113: [ZEPPELIN-3616] fix editor sections auto-collapse

2018-11-08 Thread Tagar
Github user Tagar commented on the issue:

https://github.com/apache/zeppelin/pull/3113
  
cc @zjffdu @Leemoonsoo @prabhjyotsingh can you please merge this? 

Thank you.


---


[GitHub] zeppelin issue #3206: [ZEPPELIN-3810] Support Spark 2.4

2018-11-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
Hi, All. It's announced finally.
http://spark.apache.org/news/spark-2-4-0-released.html


---


Re: Interpreter behavior

2018-11-08 Thread tecgie88
Hi 

We use zeppelin in multi-user environment, the interpreter scope mode seems to 
allow notebook execution in serial only. If multiple users are running their 
notebooks concurrently, these notebooks are queued for serial execution. If one 
notebook takes a long time to complete, it basically blocks other notebooks 
from execution. To enable parallel notebook execution, it seems we need to use 
the isolated mode, which creates a new interpreter instance (run on separate 
JVM) per user. But this can become expensive (compute resource intensive). what 
is the suggested interpreter mode for multi-user environment?

Thanks
Denny


[GitHub] zeppelin issue #3206: [ZEPPELIN-3810] Support Spark 2.4

2018-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
Hey all ~ could this get in by any change maybe?


---