Finally I can make use of Zeppelin to run spark commands. I copied the spark-assembly jar into interpreter/spark directory
From: Sambit Tripathy (RBEI/EDS1) [mailto:[email protected]] Sent: Monday, May 04, 2015 9:51 PM To: [email protected] Subject: RE: Scheduler already terminated error This is the error ERROR [2015-05-04 21:19:17,292] ({pool-1-thread-4} ProcessFunction.java[process]:41) - Internal error processing getProgress java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.deploy.SparkHadoopUtil$ at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1873) at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:105) at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:180) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:308) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159) at org.apache.spark.SparkContext.<init>(SparkContext.scala:240) at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:272) at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:394) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:73) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:299) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:938) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:923) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) From: moon soo Lee [mailto:[email protected]] Sent: Monday, May 04, 2015 9:19 PM To: [email protected]<mailto:[email protected]> Subject: Re: Scheduler already terminated error Do you have any error message in your ZEPPELIN_HOME/logs/zeppelin-interpreter-spark*.log file? On Mon, May 4, 2015 at 9:58 PM Sambit Tripathy (RBEI/EDS1) <[email protected]<mailto:[email protected]>> wrote: Thanks Moon. I do not get this error anymore now. However, when I run some command using %spark interpreter, I do not get any response back. When I checked the log files, I saw the following exception happening. Does this mean the interpreter is not working correctly? %spark val count = ctx.sql("select v1.value from versionOne v1").count() INFO [2015-05-04 13:50:19,511] ({Thread-63} NotebookServer.java[broadcast]:251) - SEND >> NOTE ERROR [2015-05-04 13:50:21,775] ({pool-1-thread-7} Job.java[run]:183) - Job failed org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.TApplicationException: Internal error processing interpret at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:221) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:212) at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:296) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.thrift.TApplicationException: Internal error processing interpret at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:190) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:175) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:204) ... 12 more INFO [2015-05-04 13:50:21,776] ({Thread-63} NotebookServer.java[afterStatusChange]:571) - Job 20150504-132152_2065400849 is finished INFO [2015-05-04 13:50:21,784] ({Thread-63} NotebookServer.java[broadcast]:251) - SEND >> NOTE INFO [2015-05-04 13:50:21,785] ({pool-1-thread-7} SchedulerFactory.java[jobFinished]:138) - Job paragraph_1430770912781_-924929327 finished by scheduler remoteinterpreter_1530616708 ERROR [2015-05-04 13:50:27,004] ({Thread-64} JobProgressPoller.java[run]:57) - Can not get or update progress org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.TApplicationException: Internal error processing getProgress at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:286) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110) at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:179) at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:54) Caused by: org.apache.thrift.TApplicationException: Internal error processing getProgress at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:235) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:221) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:284) ... 3 more From: moon soo Lee [mailto:[email protected]<mailto:[email protected]>] Sent: Friday, May 01, 2015 3:32 PM To: [email protected]<mailto:[email protected]> Subject: Re: Scheduler already terminated error I think you need spark-1.2 profile and hadoop-2.4 profile. Please try mvn install -DskipTests -Pspark-1.2 -Dspark.version=1.2.1 -Phadoop-2.4 -Dhadoop.version=2.5.0 Thanks, moon On Fri, May 1, 2015 at 10:22 AM Sambit Tripathy (RBEI/EDS1) <[email protected]<mailto:[email protected]>> wrote: Moon, This is what I have in my configuration export ZEPPELIN_INTERPRETERS=org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter export ZEPPELIN_INTERPRETER_DIR=/home/sambit/incubator-zeppelin/interpreter export ZEPPELIN_PORT=8901 export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop export SPARK_YARN_JAR=/usr/lib/spark/lib/spark-assembly-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar export ZEPPELIN_NOTEBOOK_DIR=/home/sambit/zep-notebook-dir # Where notebook saved Used this command mvn install -DskipTests -Dspark.version=1.2.1 -Dhadoop.version=2.5.0 to build Zeppelin as provided in the website That’s all. Should the –Dhadoop.version change to 2.5.0-cdh5.3.0? Regards, Sambit. From: moon soo Lee [mailto:[email protected]<mailto:[email protected]>] Sent: Thursday, April 30, 2015 5:25 PM To: [email protected]<mailto:[email protected]> Subject: Re: Scheduler already terminated error Hi, That error message can be shown when Zeppelin fails to create SparkContext. Could you check Zeppelin configuration for your yarn cluster? How did you setup Zeppelin for your Yarn cluster? Like Zeppelin build command against your spark / hadoop version, Zeppelin Interpreter setting, hadoop/yarn configuration files. Thanks, moon On Fri, May 1, 2015 at 8:02 AM Sambit Tripathy (RBEI/EDS1) <[email protected]<mailto:[email protected]>> wrote: Hi, After installation, I tried to run this simple spark command and got this error. Any idea what it could be? Command: %spark val ctx = new org.apache.spark.sql.SqlContext(sc) Error: Scheduler already terminated org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:122) org.apache.zeppelin.notebook.Note.run(Note.java:271) org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:531) org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:119) org.java_websocket.server.WebSocketServer.onWebsocketMessage(WebSocketServer.java:469) org.java_websocket.WebSocketImpl.decodeFrames(WebSocketImpl.java:368) org.java_websocket.WebSocketImpl.decode(WebSocketImpl.java:157) org.java_websocket.server.WebSocketServer$WebSocketWorker.run(WebSocketServer.java:657) ERROR What is the best way to verify that Spark Interpreter is working correctly? Is this a Yarn error? PS: I am using yarn. Regards, Sambit.
