Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Jeff Zhang
1. Zeppelin-3563 force FAIR scheduling and just allow to specify the pool
2. scheduler can not to figure out the dependencies between paragraphs.
That's why SparkInterpreter use FIFOScheduler.
If you use per user scoped mode. SparkContext is shared between users but
SparkInterpreter is not shared. That means there's multiple
SparkInterpreter instances that share the same SparkContext but they
doesn't share the same FIFOScheduler, each SparkInterpreter use its own
FIFOScheduler.

Ankit Jain 于2018年7月25日周三 下午12:58写道:

> Thanks for the quick feedback Jeff.
>
> Re:1 - I did see Zeppelin-3563 but we are not on .8 yet and also we may
> want to force FAIR execution instead of letting user control it.
>
> Re:2 - Is there an architecture issue here or we just need better thread
> safety? Ideally scheduler should be able to figure out the dependencies and
> run whatever can be parallel.
>
> Re:Interpreter mode, I may not have been clear but we are running per user
> scoped mode - so Spark context is shared among all users.
>
> Doesn't that mean all jobs from different users go to one FIFOScheduler
> forcing all small jobs to block on a big one? That is specifically we are
> trying to avoid.
>
> Thanks
> Ankit
>
> On Tue, Jul 24, 2018 at 5:40 PM, Jeff Zhang  wrote:
>
>> Regarding 1.  ZEPPELIN-3563 should be helpful. See
>> https://github.com/apache/zeppelin/blob/master/docs/interpreter/spark.md#running-spark-sql-concurrently
>> for more details.
>> https://issues.apache.org/jira/browse/ZEPPELIN-3563
>>
>> Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may
>> hit weird issues if your paragraph has dependency between each other. e.g.
>> paragraph 1 will use variable v1 which is defined in paragraph p2. Then the
>> order of paragraph execution matters here, and ParallelScheduler can
>> not guarantee the order of execution.
>> That's why we use FIFOScheduler for SparkInterpreter.
>>
>> In your scenario where multiple users share the same sparkcontext, I
>> would suggest you to use scoped per user mode. Then each user will share
>> the same sparkcontext which means you can save resources, and also they are
>> in each FIFOScheduler which is isolated from each other.
>>
>> Ankit Jain 于2018年7月25日周三 上午8:14写道:
>>
>>> Forgot to mention this is for shared scoped mode, so same Spark
>>> application and context for all users on a single Zeppelin instance.
>>>
>>> Thanks
>>> Ankit
>>>
>>> On Jul 24, 2018, at 4:12 PM, Ankit Jain  wrote:
>>>
>>> Hi,
>>> I am playing around with execution policy of Spark jobs(and all Zeppelin
>>> paragraphs actually).
>>>
>>> Looks like there are couple of control points-
>>> 1) Spark scheduling - FIFO vs Fair as documented in
>>> https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools
>>> .
>>>
>>> Since we are still on .7 version and don't have
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully
>>> doing sc.setLocalProperty("spark.scheduler.pool", "fair");
>>> in both SparkInterpreter.java and SparkSqlInterpreter.java.
>>>
>>> Also because we are exposing Zeppelin to multiple users we may not
>>> actually want users to hog the cluster and always use FAIR.
>>>
>>> This may complicate our merge to .8 though.
>>>
>>> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
>>> have a scheduler queue. Each task is submitted to a FIFOScheduler except
>>> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
>>> is turned on.
>>>
>>> I am changing SparkInterpreter.java to use ParallelScheduler too and
>>> that seems to do the trick.
>>>
>>> Now multiple notebooks are able to run in parallel.
>>>
>>> My question is if other people have tested SparkInterpreter with 
>>> ParallelScheduler?
>>> Also ideally this should be configurable. User should be specify fifo or
>>> parallel.
>>>
>>> Executing all paragraphs does add more complication and maybe
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep
>>> the execution order sane.
>>>
>>>
>>> Thoughts?
>>>
>>> --
>>> Thanks & Regards,
>>> Ankit.
>>>
>>>
>
>
> --
> Thanks & Regards,
> Ankit.
>


Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Ankit Jain
Thanks for the quick feedback Jeff.

Re:1 - I did see Zeppelin-3563 but we are not on .8 yet and also we may
want to force FAIR execution instead of letting user control it.

Re:2 - Is there an architecture issue here or we just need better thread
safety? Ideally scheduler should be able to figure out the dependencies and
run whatever can be parallel.

Re:Interpreter mode, I may not have been clear but we are running per user
scoped mode - so Spark context is shared among all users.

Doesn't that mean all jobs from different users go to one FIFOScheduler
forcing all small jobs to block on a big one? That is specifically we are
trying to avoid.

Thanks
Ankit

On Tue, Jul 24, 2018 at 5:40 PM, Jeff Zhang  wrote:

> Regarding 1.  ZEPPELIN-3563 should be helpful. See
> https://github.com/apache/zeppelin/blob/master/docs/
> interpreter/spark.md#running-spark-sql-concurrently
> for more details.
> https://issues.apache.org/jira/browse/ZEPPELIN-3563
>
> Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may
> hit weird issues if your paragraph has dependency between each other. e.g.
> paragraph 1 will use variable v1 which is defined in paragraph p2. Then the
> order of paragraph execution matters here, and ParallelScheduler can
> not guarantee the order of execution.
> That's why we use FIFOScheduler for SparkInterpreter.
>
> In your scenario where multiple users share the same sparkcontext, I would
> suggest you to use scoped per user mode. Then each user will share the same
> sparkcontext which means you can save resources, and also they are in each
> FIFOScheduler which is isolated from each other.
>
> Ankit Jain 于2018年7月25日周三 上午8:14写道:
>
>> Forgot to mention this is for shared scoped mode, so same Spark
>> application and context for all users on a single Zeppelin instance.
>>
>> Thanks
>> Ankit
>>
>> On Jul 24, 2018, at 4:12 PM, Ankit Jain  wrote:
>>
>> Hi,
>> I am playing around with execution policy of Spark jobs(and all Zeppelin
>> paragraphs actually).
>>
>> Looks like there are couple of control points-
>> 1) Spark scheduling - FIFO vs Fair as documented in
>> https://spark.apache.org/docs/2.1.1/job-scheduling.
>> html#fair-scheduler-pools.
>>
>> Since we are still on .7 version and don't have https://issues.apache.
>> org/jira/browse/ZEPPELIN-3563, I am forcefully doing sc.setLocalProperty(
>> "spark.scheduler.pool", "fair");
>> in both SparkInterpreter.java and SparkSqlInterpreter.java.
>>
>> Also because we are exposing Zeppelin to multiple users we may not
>> actually want users to hog the cluster and always use FAIR.
>>
>> This may complicate our merge to .8 though.
>>
>> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
>> have a scheduler queue. Each task is submitted to a FIFOScheduler except
>> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
>> is turned on.
>>
>> I am changing SparkInterpreter.java to use ParallelScheduler too and
>> that seems to do the trick.
>>
>> Now multiple notebooks are able to run in parallel.
>>
>> My question is if other people have tested SparkInterpreter with 
>> ParallelScheduler?
>> Also ideally this should be configurable. User should be specify fifo or
>> parallel.
>>
>> Executing all paragraphs does add more complication and maybe
>>
>> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep
>> the execution order sane.
>>
>>
>> Thoughts?
>>
>> --
>> Thanks & Regards,
>> Ankit.
>>
>>


-- 
Thanks & Regards,
Ankit.


[GitHub] zeppelin pull request #3095: ZEPPELIN-3652. Remove reflection in SparkInterp...

2018-07-24 Thread zjffdu
GitHub user zjffdu opened a pull request:

https://github.com/apache/zeppelin/pull/3095

ZEPPELIN-3652. Remove reflection in SparkInterpreter

### What is this PR for?
A few sentences describing the overall goals of the pull request's commits.
First time? Check out the contributing guide - 
https://zeppelin.apache.org/contribution/contributions.html


### What type of PR is it?
[Bug Fix | Improvement | Feature | Documentation | Hot Fix | Refactoring]

### Todos
* [ ] - Task

### What is the Jira issue?
* Open an issue on Jira https://issues.apache.org/jira/browse/ZEPPELIN/
* Put link here, and add [ZEPPELIN-*Jira number*] in PR title, eg. 
[ZEPPELIN-533]

### How should this be tested?
* First time? Setup Travis CI as described on 
https://zeppelin.apache.org/contribution/contributions.html#continuous-integration
* Strongly recommended: add automated unit tests for any new or changed 
behavior
* Outline any manual steps to test the PR here.

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update?
* Is there breaking changes for older versions?
* Does this needs documentation?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjffdu/zeppelin ZEPPELIN-3652

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/3095.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3095


commit 3653e9e254ccf4164d6d8d00232cbf4d476ae885
Author: Jeff Zhang 
Date:   2018-07-24T05:13:47Z

ZEPPELIN-3652. Remove reflection in SparkInterpreter




---


Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Jeff Zhang
Regarding 1.  ZEPPELIN-3563 should be helpful. See
https://github.com/apache/zeppelin/blob/master/docs/interpreter/spark.md#running-spark-sql-concurrently
for more details.
https://issues.apache.org/jira/browse/ZEPPELIN-3563

Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may hit
weird issues if your paragraph has dependency between each other. e.g.
paragraph 1 will use variable v1 which is defined in paragraph p2. Then the
order of paragraph execution matters here, and ParallelScheduler can
not guarantee the order of execution.
That's why we use FIFOScheduler for SparkInterpreter.

In your scenario where multiple users share the same sparkcontext, I would
suggest you to use scoped per user mode. Then each user will share the same
sparkcontext which means you can save resources, and also they are in each
FIFOScheduler which is isolated from each other.

Ankit Jain 于2018年7月25日周三 上午8:14写道:

> Forgot to mention this is for shared scoped mode, so same Spark
> application and context for all users on a single Zeppelin instance.
>
> Thanks
> Ankit
>
> On Jul 24, 2018, at 4:12 PM, Ankit Jain  wrote:
>
> Hi,
> I am playing around with execution policy of Spark jobs(and all Zeppelin
> paragraphs actually).
>
> Looks like there are couple of control points-
> 1) Spark scheduling - FIFO vs Fair as documented in
> https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools
> .
>
> Since we are still on .7 version and don't have
> https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully
> doing sc.setLocalProperty("spark.scheduler.pool", "fair");
> in both SparkInterpreter.java and SparkSqlInterpreter.java.
>
> Also because we are exposing Zeppelin to multiple users we may not
> actually want users to hog the cluster and always use FAIR.
>
> This may complicate our merge to .8 though.
>
> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
> have a scheduler queue. Each task is submitted to a FIFOScheduler except
> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
> is turned on.
>
> I am changing SparkInterpreter.java to use ParallelScheduler too and that
> seems to do the trick.
>
> Now multiple notebooks are able to run in parallel.
>
> My question is if other people have tested SparkInterpreter with 
> ParallelScheduler?
> Also ideally this should be configurable. User should be specify fifo or
> parallel.
>
> Executing all paragraphs does add more complication and maybe
>
> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep the
> execution order sane.
>
>
> Thoughts?
>
> --
> Thanks & Regards,
> Ankit.
>
>


Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Ankit Jain
Forgot to mention this is for shared scoped mode, so same Spark application and 
context for all users on a single Zeppelin instance.

Thanks
Ankit

> On Jul 24, 2018, at 4:12 PM, Ankit Jain  wrote:
> 
> Hi,
> I am playing around with execution policy of Spark jobs(and all Zeppelin 
> paragraphs actually).
> 
> Looks like there are couple of control points-
> 1) Spark scheduling - FIFO vs Fair as documented in 
> https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools.
> 
> Since we are still on .7 version and don't have 
> https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully doing 
> sc.setLocalProperty("spark.scheduler.pool", "fair");
> in both SparkInterpreter.java and SparkSqlInterpreter.java.
> 
> Also because we are exposing Zeppelin to multiple users we may not actually 
> want users to hog the cluster and always use FAIR.
> 
> This may complicate our merge to .8 though.
> 
> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to have 
> a scheduler queue. Each task is submitted to a FIFOScheduler except 
> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag 
> is turned on.
> 
> I am changing SparkInterpreter.java to use ParallelScheduler too and that 
> seems to do the trick.
> 
> Now multiple notebooks are able to run in parallel.
> 
> My question is if other people have tested SparkInterpreter with 
> ParallelScheduler? Also ideally this should be configurable. User should be 
> specify fifo or parallel.
> 
> Executing all paragraphs does add more complication and maybe
> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep the 
> execution order sane.
> 
> Thoughts?
> 
> -- 
> Thanks & Regards,
> Ankit.


Parallel Execution of Spark Jobs

2018-07-24 Thread Ankit Jain
Hi,
I am playing around with execution policy of Spark jobs(and all Zeppelin
paragraphs actually).

Looks like there are couple of control points-
1) Spark scheduling - FIFO vs Fair as documented in
https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools
.

Since we are still on .7 version and don't have
https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully doing
sc.setLocalProperty("spark.scheduler.pool", "fair");
in both SparkInterpreter.java and SparkSqlInterpreter.java.

Also because we are exposing Zeppelin to multiple users we may not actually
want users to hog the cluster and always use FAIR.

This may complicate our merge to .8 though.

2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
have a scheduler queue. Each task is submitted to a FIFOScheduler except
SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
is turned on.

I am changing SparkInterpreter.java to use ParallelScheduler too and that
seems to do the trick.

Now multiple notebooks are able to run in parallel.

My question is if other people have tested SparkInterpreter with
ParallelScheduler?
Also ideally this should be configurable. User should be specify fifo or
parallel.

Executing all paragraphs does add more complication and maybe

https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep the
execution order sane.


Thoughts?

-- 
Thanks & Regards,
Ankit.


[jira] [Created] (ZEPPELIN-3659) 'Using Pig for querying data' tutorial is outdated

2018-07-24 Thread Alex Byrd (JIRA)
Alex Byrd created ZEPPELIN-3659:
---

 Summary: 'Using Pig for querying data' tutorial is outdated
 Key: ZEPPELIN-3659
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3659
 Project: Zeppelin
  Issue Type: Bug
  Components: documentation
Reporter: Alex Byrd


The third paragraph (first that is not a description) calls hadoop.

hadoop fs -put bank.csv .

 

This hadoop call infers hadoop is already installed, which is not mentioned in 
the previous paragraphs as a dependency or included/mentioned in the 
installation files and quickstart. When a user has a fresh install and just 
hits 'run all paragraphs' it'll error out here.

 

While this is not terribly difficult to overcome it creates friction in getting 
up and running without issues. The next question is what version of Hadoop to 
use, I've tested with Hadoop 2.7.7 and it appears to work just fine but I 
haven't fully vetted it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3658) Website 'External Resourced' URLs are not pointing anywhere

2018-07-24 Thread Alex Byrd (JIRA)
Alex Byrd created ZEPPELIN-3658:
---

 Summary: Website 'External Resourced' URLs are not pointing 
anywhere
 Key: ZEPPELIN-3658
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3658
 Project: Zeppelin
  Issue Type: Bug
  Components: front-end, Homepage
Affects Versions: 0.8.0
Reporter: Alex Byrd
 Attachments: ZEPPELIN-URL.PNG

If you navigate to 
[http://zeppelin.apache.org/docs/0.8.0/quickstart/install.html] and hover over 
the More dropdown, the 'External Resources' section just redirects to the 
current page (and not the correct page)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3657) 'Using Mahout' Tutorial is outdated

2018-07-24 Thread Alex Byrd (JIRA)
Alex Byrd created ZEPPELIN-3657:
---

 Summary: 'Using Mahout' Tutorial is outdated
 Key: ZEPPELIN-3657
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3657
 Project: Zeppelin
  Issue Type: Bug
  Components: front-end, GUI, Interpreters
Affects Versions: 0.8.0
Reporter: Alex Byrd
 Attachments: ZEPPELIN-MAHOUT.PNG

The notebook instructions reference folders and files that do not exist in 
Zeppelin 0.8.0

ex. scripts/mahout/add_mahout.py

 

Also the documentation references the same:

https://zeppelin.apache.org/docs/0.8.0/interpreter/mahout.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zeppelin issue #3066: [ZEPPELIN-3618] ZeppelinContext methods z.run and z.ru...

2018-07-24 Thread egorklimov
Github user egorklimov commented on the issue:

https://github.com/apache/zeppelin/pull/3066
  
@jongyoul 
CI is green 
https://travis-ci.org/TinkoffCreditSystems/zeppelin/builds/407590408


---


[GitHub] zeppelin pull request #3078: [ZEPPELIN-3628] HTML anchor links on paragraph ...

2018-07-24 Thread egorklimov
Github user egorklimov closed the pull request at:

https://github.com/apache/zeppelin/pull/3078


---


[GitHub] zeppelin pull request #3078: [ZEPPELIN-3628] HTML anchor links on paragraph ...

2018-07-24 Thread egorklimov
GitHub user egorklimov reopened a pull request:

https://github.com/apache/zeppelin/pull/3078

[ZEPPELIN-3628] HTML anchor links on paragraph don't work in Google Chrome

### What is this PR for?
Links like [1] work well in Firefox, but don't work in Chrome.
1. http://zeppelin/#/notebook/NOTEID?paragraph=PARAGRAPHID

### What type of PR is it?
Improvement

### What is the Jira issue?
issue on Jira https://issues.apache.org/jira/browse/ZEPPELIN-3628

### How should this be tested?
* Manual checking (see screenshot below)

### Screenshots (if appropriate)
*  Before
Chrome (v67.0.3396.99):


![chrome](https://user-images.githubusercontent.com/6136993/42819776-31f951e6-89dd-11e8-9618-710f61ea550f.gif)
Firefox (v61.0.1):


![firefox](https://user-images.githubusercontent.com/6136993/42819794-39c5562c-89dd-11e8-8059-4e8f85855471.gif)
*  After
Chrome (v67.0.3396.99)


![fixed](https://user-images.githubusercontent.com/6136993/42819862-63095cd6-89dd-11e8-80a0-843e613ca867.gif)

### Questions:
* Does the licenses files need update? 
Yes, JQuery.scrollTo changed from 1.4.14 to 2.1.2 
* Is there breaking changes for older versions? No
* Does this needs documentation? No


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TinkoffCreditSystems/zeppelin ZEPPELIN-3628

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/3078.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3078


commit 87ac4ac35c41ec09d092bf28059aa01730157002
Author: egorklimov 
Date:   2018-07-17T12:03:45Z

jQuery ScrollTo updated

commit e2ead1035c67c374e26bc7da92973908e8605b8a
Author: egorklimov 
Date:   2018-07-17T12:20:42Z

jQuery ScrollTo license updated

commit 8b521f80e21e162a9f74d0958b17002d9d0ab4af
Author: egorklimov 
Date:   2018-07-17T14:57:57Z

License github link fixed

commit 3757129dd0515b1c1db9ac42dce1c0dc142af99d
Author: egorklimov 
Date:   2018-07-20T11:23:34Z

Cursor in paragraph text fixed




---


[GitHub] zeppelin issue #3078: [ZEPPELIN-3628] HTML anchor links on paragraph don't w...

2018-07-24 Thread egorklimov
Github user egorklimov commented on the issue:

https://github.com/apache/zeppelin/pull/3078
  
@jongyoul 
CI is green 
https://travis-ci.org/TinkoffCreditSystems/zeppelin/builds/407589455


---


[GitHub] zeppelin issue #3094: [ZEPPELIN-3656] Fix for completion with Livy interpret...

2018-07-24 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/3094
  
Please add unit test


---


[GitHub] zeppelin pull request #3094: [ZEPPELIN-3656] Fix for completion with Livy in...

2018-07-24 Thread alexjbush
GitHub user alexjbush opened a pull request:

https://github.com/apache/zeppelin/pull/3094

[ZEPPELIN-3656] Fix for completion with Livy interpreter

### What is this PR for?
Fix for NullPointerException when using code completion in the Livy 
Interpreter when Shared Interpreter is enabled.

### What type of PR is it?
Bug Fix

### What is the Jira issue?
[ZEPPELIN-3656](https://issues.apache.org/jira/browse/ZEPPELIN-3656)

### How should this be tested?
Run Livy Interpreter in an environment where Shared Interpreter is enabled 
and attempt to trigger code completions.

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alexjbush/zeppelin ZEPPELIN-3656

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/3094.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3094


commit 32aa7c5670399325308fa31575ae95963d053372
Author: Alex Bush 
Date:   2018-07-24T14:44:03Z

[ZEPPELIN-3656] Fix for completion with Livy interpreter




---


[jira] [Created] (ZEPPELIN-3656) Livy Code Completion does not work when using Shared Interpreter

2018-07-24 Thread Alex Bush (JIRA)
Alex Bush created ZEPPELIN-3656:
---

 Summary: Livy Code Completion does not work when using Shared 
Interpreter
 Key: ZEPPELIN-3656
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3656
 Project: Zeppelin
  Issue Type: Bug
  Components: livy-interpreter
Affects Versions: 0.8.0
 Environment: {{Zeppelin 0.8.0}}

Livy 0.5

Spark 2.1

Kerberos+SSL
Reporter: Alex Bush
 Fix For: 0.9.0, 0.8.1


When attempting to use code completion with Livy 0.5 I get a 
NullPointerException:

 
{code:java}
2018-07-24 14:05:05,504 ERROR org.apache.thrift.server.TThreadPoolServer: Error 
occurred during processing of message.
java.lang.NullPointerException
at 
org.apache.zeppelin.livy.BaseLivyInterpreter.callCompletion(BaseLivyInterpreter.java:284)
at 
org.apache.zeppelin.livy.BaseLivyInterpreter.completion(BaseLivyInterpreter.java:271)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.completion(LazyOpenInterpreter.java:138)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.completion(RemoteInterpreterServer.java:736)
at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$completion.getResult(RemoteInterpreterService.java:1940)
at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$completion.getResult(RemoteInterpreterService.java:1925)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-07-24 14:05:05,508 INFO org.apache.zeppelin.socket.NotebookServer: Fail to 
get completion
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:139)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.completion(RemoteInterpreter.java:358)
at org.apache.zeppelin.notebook.Paragraph.completion(Paragraph.java:270)
at org.apache.zeppelin.notebook.Note.completion(Note.java:729)
at 
org.apache.zeppelin.socket.NotebookServer.completion(NotebookServer.java:1397)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:303)
at 
org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at 
org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at 
org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at 
org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at 
org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at 
org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at 
org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at 
org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at 
org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at 
org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_completion(RemoteInterpreterService.java:372)
at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.completion(RemoteInterpreterService.java:355)
at 

[GitHub] zeppelin issue #3093: [ZEPPELIN-3655] Add missing roles information to api/n...

2018-07-24 Thread oxygen311
Github user oxygen311 commented on the issue:

https://github.com/apache/zeppelin/pull/3093
  
It's don't easy to check this bug with a unit test. Even in Paragraph is 
not running because of  "user has no permission for interpreter" response in 
rest api would be "OK"


---


[GitHub] zeppelin issue #3093: [ZEPPELIN-3655] Add missing roles information to api/n...

2018-07-24 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/3093
  
Can you add unit test ?


---


[jira] [Created] (ZEPPELIN-3655) Fix running a paragraph through the REST API with a restricted access interpreter

2018-07-24 Thread Alexey Zabelkin (JIRA)
Alexey Zabelkin created ZEPPELIN-3655:
-

 Summary: Fix running a paragraph through the REST API with a 
restricted access interpreter
 Key: ZEPPELIN-3655
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3655
 Project: Zeppelin
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Alexey Zabelkin


How to reproduce bug:
 * Create an access group, add the user to this group;
 * In the interpreter, we assign access for this group (we do not register the 
username on the access);
 * Run through REST runAll;
 * We get an error that the user does not have access to the interpreter.

But we don't get this error if run in browser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3654) New Hazelcast Jet interpreter

2018-07-24 Thread Vincenzo Selvaggio (JIRA)
Vincenzo Selvaggio created ZEPPELIN-3654:


 Summary: New Hazelcast Jet interpreter
 Key: ZEPPELIN-3654
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3654
 Project: Zeppelin
  Issue Type: New Feature
  Components: Interpreters
Reporter: Vincenzo Selvaggio


Zeppelin has interpreters for different data processing systems like Flink, 
Spark, Kylin, Ignite, Geode, Beam, etc.

Hazelcast Jet is a general purpose distributed data processing engine, built on 
top of Hazelcast for stream/batch processing, comparable if not better in terms 
of performance to the engine supported by Zeppelin therefore a perfect 
candidate for a Zeppelin interpreter.


Part of the interpreter is to have a set of utility methods that print out 
Hazelcast data structures ({{IMap}} and {{ICache}}) and leverage Zeppelin's 
built in visualization (%table).

What's more, a nice addition is to have the Hazelcast Jet DAG of the pipeline 
displayed as a network graph using %network display system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zeppelin pull request #3092: Java interpreter

2018-07-24 Thread selvinsource
GitHub user selvinsource opened a pull request:

https://github.com/apache/zeppelin/pull/3092

Java interpreter

### What is this PR for?
**New Java interpreter**
There are several Java libraries that could be used to leverage the Data 
Visualization & Collaboration features of Zeppelin hence the need of a Java 
interpreter to run any java code / library with no further dependencies.
Dependencies to any java library can be added by the end users in the Java 
Zeppelin interpreter Dependencies settings section.

### What type of PR is it?
* Feature

### Todos
* Any feedback from reviewers

### What is the Jira issue?
* [ZEPPELIN-3653]

### How should this be tested?
Manually
* Start the Zeppelin server
* Create a new note with the java interpreter binding
* Write some java code as per documentation (docs/interpreter/java.md)
Unit tests
* Run unit tests (JavaInterpreterTest.java and 
JavaInterpreterUtilsTest.java)

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update?
No, the dependency on com.thoughtworks.qdox was already added as part of 
the Beam Interpreter.
* Is there breaking changes for older versions?
No.
* Does this needs documentation?
Yes, it has been added to the PR, see docs/interpreter/java.md.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/selvinsource/zeppelin java-interpreter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zeppelin/pull/3092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3092


commit 1cd20d5023513073d6e624aec203e78a22ddb805
Author: Vincenzo Selvaggio 
Date:   2018-07-21T14:24:33Z

Java Interpreter first version

commit 157116ec5f772e1aa8ed4e794ddf680fe799ae5a
Author: Vincenzo Selvaggio 
Date:   2018-07-23T15:10:35Z

Added Java Interpreter Utils and tests for them.

commit a2be5e2f6a8b4742b0a23b1773cdac40e998f60a
Author: Vincenzo Selvaggio 
Date:   2018-07-24T08:44:46Z

Updated documentation for Java interpreter by adding relevant examples.

commit a05921b2711c598d5f89565eeb9bc586bfb5f1fd
Author: Vincenzo Selvaggio 
Date:   2018-07-21T14:24:33Z

Java Interpreter first version

commit 0b0a3e349f98f36ee93956a36a18ed99166983ea
Author: Vincenzo Selvaggio 
Date:   2018-07-23T15:10:35Z

Added Java Interpreter Utils and tests for them.

commit 2855e8349778df93c39dbf1bc3349577dcf24047
Author: Vincenzo Selvaggio 
Date:   2018-07-24T08:44:46Z

Updated documentation for Java interpreter by adding relevant examples.

commit d94dd1086d2bf3a0b6b8fa1427169fca0b4e7d53
Author: Vincenzo Selvaggio 
Date:   2018-07-24T08:55:29Z

Merge remote-tracking branch 'origin/java-interpreter' into java-interpreter




---


[jira] [Created] (ZEPPELIN-3653) New Java interpreter

2018-07-24 Thread Vincenzo Selvaggio (JIRA)
Vincenzo Selvaggio created ZEPPELIN-3653:


 Summary: New Java interpreter
 Key: ZEPPELIN-3653
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3653
 Project: Zeppelin
  Issue Type: New Feature
  Components: Interpreters
Reporter: Vincenzo Selvaggio


There are several Java libraries that could be used to leverage the Data 
Visualization & Collaboration features of Zeppelin hence the need of a Java 
interpreter to run any java code / library with no further dependencies.

 

Dependencies to any java library can be added by the end users in the Java 
Zeppelin interpreter Dependencies settings section.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Zeppelin distributed architecture design

2018-07-24 Thread Jongyoul Lee
Thank you.

I fully agree with you that we need a framework to support distributed
version. IMHO, we cannot afford to develop our own. I'll dig into atomix as
well.



On Tue, Jul 24, 2018 at 1:57 PM, liuxun  wrote:

> @Jongyoul Lee:
> Thank you for your attention.
>
> Indeed, as you said, the `Copycat` project has been closed and has been
> migrated to `https://github.com/atomix/atomix`
> .
>
> I also considered this issue during development.
> The main reason was that it was enough to realize Raft using `Copycat` at
> the time, and it was not considered too long.
>
> Today, I took a look at the documentation of atomix,
> https://atomix.io/docs/latest/user-manual/ ,
> which has a lot of features, such as broadcasting messages in the cluster,
> detecting cluster events... ,
> From the perspective of zeppelin's long-term development, it is better to
> use atomix.
> So, I will switch the Raft protocol algorithm library to atomix, which is
> not difficult to modify.
>
> Struggle for zeppelin!!! :-)
>
>
> 在 2018年7月24日,上午9:35,Jongyoul Lee  写道:
>
> First of all, thank you for your effort and contribution.
>
> I read it carefully today, and personally, it's a very nice feature and
> idea.
>
> Let's discuss it and improve more concretely. I also left comments on the
> doc.
>
> And I have a simple question.
>
> `Copycat`, which you used to implement it, is deprecated by owner[1] and
> moved under https://github.com/atomix/atomix/. I'm afraid of it. Do you
> have any reason to use this library? It's even SNAPSHOT version.
>
> Regards,
> JL
>
> [1]: https://github.com/atomix/copycat
>
> On Sat, Jul 21, 2018 at 2:07 AM, liuxun  wrote:
>
> HI:
>
> In order to more intuitively express the actual use of distributed
> zeppelin clusters.
> I updated this design document, starting with the 16th page of the
> document, adding 2 GIF animations showing the operation record screen of
> the zeppelin cluster we are using now.
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
> VDKCRRBm-Qa3Bw/edit#  1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>
> Distributed clustered zeppelin is already in use at our company, and the
> recorded screens are all real.
> The first recorded screens GIF shows the following
> Create a cluster of three zeppelin servers
> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
> zeppelin-site.xml to create a cluster
> Start these 3 servers at the same time
> Open the web pages of these 3 servers and prepare for the notebook
> operation.
>
>
> The second recorded screens GIF shows the following
> Create an interpreter process in the cluster
> Create a notebook on host234 and execute it, This action will create an
> interpreter process in the server with free resources in the cluster.
> You can then continue editing this notebook on host235 and execute it, You
> can return results immediately without waiting for the time to create an
> interpreter process.
> Again, you can continue to edit this notebook on host236. And execute it,
> you can return results immediately without waiting for the time to create
> the interpreter process
> The same notebook will reuse the first created interpreter process, so you
> can get the execution result immediately on any server.
> By looking at the background server process, you will find that host234,
> host235, and host235 use the same interpreter process for the same
> notebook.
>
> Originally, I wanted to record the interpreter process exception. The
> cluster re-created the screenshot of the interpreter process in the idle
> server, but I am too tired now.
> There is time to record later.
>
>
> 在 2018年7月19日,上午7:36,Ruslan Dautkhanov  写道:
>
> Thank you luxun,
>
> I left a couple of comments in that google document.
>
> --
> Ruslan Dautkhanov
>
>
> On Tue, Jul 17, 2018 at 11:30 PM liuxun 
> neliu...@163.com>> wrote:
>
> hi,Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I added
>
> 3 schematics to illustrate.
>
> 1. Distributed Zeppelin Deployment architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
>
> The email attachment exceeded the size limit, so I reorganized the
>
> document and updated it with Google Docs.
>
> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>
> VDKCRRBm-Qa3Bw/edit?usp=sharing  1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>
>
>
> 在 2018年7月18日,下午1:03,liuxun mailto:neliu...@163.com>>
>
> 写道:
>
>
> hi,Ruslan Dautkhanov
>
> Thank you very much for your question. according to your advice, I
>
> added 3 schematics to illustrate.
>
> 1. Zeppelin Cluster architecture diagram.
> 2. Distributed zeppelin Server fault tolerance diagram.
> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>
> Later, I will merge the schematic 

[GitHub] zeppelin issue #3090: [Zeppelin-3645] Add LSP Protocol completion support

2018-07-24 Thread oxygen311
Github user oxygen311 commented on the issue:

https://github.com/apache/zeppelin/pull/3090
  
@felixcheung 
I have added property `zeppelin.python.useLsp` which is disabled by 
default. It does not seem unsecured for me now. We also can specify host and 
port by ourselves with `zeppelin.python.lspHost` and `zeppelin.python.lspPort` 
properties.


---


[GitHub] zeppelin issue #3090: [Zeppelin-3645] Add LSP Protocol completion support

2018-07-24 Thread oxygen311
Github user oxygen311 commented on the issue:

https://github.com/apache/zeppelin/pull/3090
  
@zjffdu 
Sorry, correct line is `pip install python-language-server`, i will fix it.


---