Hi Moon, I can suggest another approach to reproduce this.
1. Create a spark interpreter with less Executor memory (say 128 M). 2. Using this interpreter try to do something memory intensive. Say you try to load a data set worth of 20GB and then run a select count(*). This will eventually kill the executor process and I generally get RemoteInterpreter not found/Connection refused error. 3. Now you try to rerun the same paragraph executing Select count(*). You will get scheduler terminated error. Regards, Sourav On Thu, Sep 17, 2015 at 5:25 AM, linxi zeng <linxizeng0...@gmail.com> wrote: > actually, there is a way to reproduce the problem (maybe not a very > suitable example): > (1)modify dereference() in *RemoteInterpreterProcess.java* like this: > > *diff --git > a/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java > b/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java* > > *index 534af27..e02b16a 100644* > > *--- > a/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java* > > *+++ > b/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java* > > *@@ -146,7 +146,8 @@* public class RemoteInterpreterProcess implements > ExecuteResultHandler { > > public int dereference() { > > synchronized (referenceCount) { > > int r = referenceCount.decrementAndGet(); > > *- if (r == 0) {* > > *+ //if (r == 0) {* > > *+ if (false) {* > > logger.info("shutdown interpreter process"); > > remoteInterpreterEventPoller.shutdown(); > > > (2)restart this interpreter in interpreter settings > > [image: 内嵌图片 1] > > (3)run spark paragraph: > > [image: 内嵌图片 2] > > > > 2015-09-09 23:13 GMT+08:00 moon soo Lee <m...@apache.org>: > >> If there're some way to reproduce the problem it'll help a lot. >> Let me investigate more on this problem. >> >> I'm working on improving interpreter process restart. >> >> https://github.com/Leemoonsoo/incubator-zeppelin/commit/3200b9aac26d394a67d496c3b209eb3cda046c4a >> Once i know how to reproduce "Scheduler already terminated Exception", >> I'll make pullrequest together with this improvement. >> >> Thanks, >> moon >> >> >> On Mon, Sep 7, 2015 at 5:44 AM linxi zeng <linxizeng0...@gmail.com> >> wrote: >> >>> hi, moon: >>> >>> After change some settings and restarting interpreter, the scheduler of >>> interpreter will be terminated and the RemoteInterpreterServer process >>> should be stopped too. But if the RemoteInterpreterServer didn't shutdown >>> as expected, an exception named "Scheduler already terminated" will be >>> thrown when we run paragraphs using this interpreter (such as spark). Then >>> restart the zeppelin server seems the only way to solve the problem. >>> >>> This problem has already happen several times, but still have no idea >>> how to stable reproduct it. I was thinking that if we can restart the >>> RemoteInterpreterServer when we catch this Exception? >>> >>> Do you have any idea to solve this problem? >>> >>> >>> By the way, The detail error info is like that: >>> >>> INFO [2015-09-06 10:21:47,487] ({qtp1633200777-7462} >>> NotebookServer.java[onMessage]:112) - RECEIVE << RUN_PARAGRAPH >>> INFO [2015-09-06 10:21:47,493] ({qtp1633200777-7462} >>> NotebookServer.java[broadcast]:264) - SEND >> NOTE >>> ERROR [2015-09-06 10:21:47,495] ({qtp1633200777-7462} >>> NotebookServer.java[runParagraph]:640) - Exception from run >>> java.lang.RuntimeException: Scheduler already terminated >>> at >>> org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124) >>> at org.apache.zeppelin.notebook.Note.run(Note.java:282) >>> at >>> org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:638) >>> at >>> org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:137) >>> at >>> org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56) >>> at >>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC6455.java:835) >>> at >>> org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349) >>> at >>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225) >>> at >>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) >>> at >>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) >>> at >>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) >>> at >>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> >