Hi Moon,

I can suggest another approach to reproduce this.

1. Create a spark interpreter with less Executor memory (say 128 M).

2. Using this interpreter try to do something memory intensive. Say you try
to load a data set worth of 20GB and then run a select count(*). This will
eventually kill the executor process and I generally get RemoteInterpreter
not found/Connection refused error.

3. Now you try to rerun the same paragraph executing Select count(*). You
will get scheduler terminated error.

Regards,
Sourav




On Thu, Sep 17, 2015 at 5:25 AM, linxi zeng <linxizeng0...@gmail.com> wrote:

> actually, there is a way to reproduce the problem (maybe not a very
> suitable example):
> (1)modify dereference() in *RemoteInterpreterProcess.java* like this:
>
> *diff --git
> a/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java
> b/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java*
>
> *index 534af27..e02b16a 100644*
>
> *---
> a/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java*
>
> *+++
> b/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java*
>
> *@@ -146,7 +146,8 @@* public class RemoteInterpreterProcess implements
> ExecuteResultHandler {
>
>    public int dereference() {
>
>      synchronized (referenceCount) {
>
>        int r = referenceCount.decrementAndGet();
>
> *-      if (r == 0) {*
>
> *+      //if (r == 0) {*
>
> *+      if (false) {*
>
>          logger.info("shutdown interpreter process");
>
>          remoteInterpreterEventPoller.shutdown();
>
>
> (2)restart this interpreter in interpreter settings
>
> [image: 内嵌图片 1]
>
> (3)run spark paragraph:
>
> [image: 内嵌图片 2]
>
>
>
> 2015-09-09 23:13 GMT+08:00 moon soo Lee <m...@apache.org>:
>
>> If there're some way to reproduce the problem it'll help a lot.
>> Let me investigate more on this problem.
>>
>> I'm working on improving interpreter process restart.
>>
>> https://github.com/Leemoonsoo/incubator-zeppelin/commit/3200b9aac26d394a67d496c3b209eb3cda046c4a
>> Once i know how to reproduce "Scheduler already terminated Exception",
>> I'll make pullrequest together with this improvement.
>>
>> Thanks,
>> moon
>>
>>
>> On Mon, Sep 7, 2015 at 5:44 AM linxi zeng <linxizeng0...@gmail.com>
>> wrote:
>>
>>> hi, moon:
>>>
>>> After change some settings and restarting interpreter, the scheduler of
>>> interpreter will be terminated and the RemoteInterpreterServer process
>>> should be stopped too. But if the RemoteInterpreterServer didn't shutdown
>>> as expected, an exception named "Scheduler already terminated" will be
>>> thrown when we run paragraphs using this interpreter (such as spark). Then
>>> restart the zeppelin server seems the only way to solve the problem.
>>>
>>> This problem has already happen several times, but still have no idea
>>> how to stable reproduct it.  I was thinking that if we can restart the
>>> RemoteInterpreterServer when we catch this Exception?
>>>
>>> Do you have any idea to solve this problem?
>>>
>>>
>>> By the way, The detail error info is like that:
>>>
>>>  INFO [2015-09-06 10:21:47,487] ({qtp1633200777-7462} 
>>> NotebookServer.java[onMessage]:112) - RECEIVE << RUN_PARAGRAPH
>>>  INFO [2015-09-06 10:21:47,493] ({qtp1633200777-7462} 
>>> NotebookServer.java[broadcast]:264) - SEND >> NOTE
>>> ERROR [2015-09-06 10:21:47,495] ({qtp1633200777-7462} 
>>> NotebookServer.java[runParagraph]:640) - Exception from run
>>> java.lang.RuntimeException: Scheduler already terminated
>>>         at 
>>> org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124)
>>>         at org.apache.zeppelin.notebook.Note.run(Note.java:282)
>>>         at 
>>> org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:638)
>>>         at 
>>> org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:137)
>>>         at 
>>> org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56)
>>>         at 
>>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC6455.java:835)
>>>         at 
>>> org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349)
>>>         at 
>>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225)
>>>         at 
>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
>>>         at 
>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
>>>         at 
>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>>>         at 
>>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>

Reply via email to