hi,Sarthak Sharma You can be at the zeppelin server, Run ./bin/spark-submit --class org.apache.spark.examples.SparkPi, Test it to see if there is a problem with the spark runtime environment on the zeppelin server.
> 在 2018年11月20日,下午5:39,Sarthak Sharma <sarthak...@media.net> 写道: > > Is it similar to an existing bug related to the interpreter processes getting > stuck ? (wherein the workaround is to kill the application on yarn, restart > the interpreter from the interface and then try resubmitting the query > again). > The problem in this case is that it is intermittently happening on some spark > interpreters randomly. And since the driver app is not scheduled on yarn, > there are no logs available to figure out the reason for this issue. > > Thanks and Regards > > Sarthak Sharma > DevOps Engineer, Media.Net > +918002228376 <tel:+918002228376> | sarthak...@media.net > <mailto:sarthak...@media.net> > <http://en-gb.facebook.com/people/Sarthak-Sharma/100006006014244> > <http://in.linkedin.com/in/sarthaksharma96> > > > On Tue, Nov 20, 2018 at 2:22 PM Jeff Zhang <zjf...@gmail.com > <mailto:zjf...@gmail.com>> wrote: > If zeppelin.interpreter.connect.timeout is reached, but the yarn app is still > in ACCEPTED state, then this should be a bug. The yarn app should be killed > it it can not be created in the timeout threashold > > Sarthak Sharma <sarthak...@media.net <mailto:sarthak...@media.net>> > 于2018年11月20日周二 下午4:47写道: > Hey, > > Like you mentioned, I'm already using the spark.yarn.queue parameter, hence I > know which yarn queue it is getting scheduled in and this queue has resources > available for applications since other apps are also getting scheduled there. > However, assuming the queue does NOT have resources for it to schedule within > the given time frame causing it to throw an exception after the > zeppelin.interpreter.connect.timeout is reached, the application should in > any case get scheduled eventually which is not the case here. Interpreter > driver process remains stuck in ACCEPTED state. Is there a change in the way > it is implemented in this version ? Since we never experienced this on the > previous one (zeppelin-0.7.3) where drivers would get scheduled eventually in > their respective queues. > > On Tue, Nov 20, 2018, 7:29 AM Xun Liu <neliu...@163.com > <mailto:neliu...@163.com> wrote: > HI,Sarthak Sharma > > The log shows that the task submitted by spark-submmit has been waiting for > execution in the queue of YARN. Is there no resource for the queue of YARN? > You can specify a queue with resources in the spark interpreter via the > spark.yarn.queue parameter. > > >> 在 2018年11月19日,下午7:41,Sarthak Sharma <sarthak...@media.net >> <mailto:sarthak...@media.net>> 写道: >> >> Hi, >> >> We already have a zeppelin-0.7.3 setup which runs fine and is in use >> currently but we are looking into the yarn cluster mode support for spark >> interpreter in zeppelin-0.8. I've built it from source from branch-0.8 (As >> of Nov-15) and am facing the following issues intermittently in some of the >> spark interpreters while trying to use spark-sql on it. >> >> 18/11/19 10:04:07 INFO yarn.Client: Submitting application >> application_1542587655772_35129 to ResourceManager >> 18/11/19 10:04:07 INFO impl.YarnClientImpl: Submitted application >> application_1542587655772_35129 >> 18/11/19 10:04:08 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:08 INFO yarn.Client: >> client token: N/A >> diagnostics: N/A >> ApplicationMaster host: N/A >> ApplicationMaster RPC port: -1 >> queue: root.zep >> start time: 1542621847537 >> final status: UNDEFINED >> tracking URL: >> http://resource-manager-addr/proxy/application_1542587655772_35129/ >> <http://c8-auto-hadoop-service-1.srv.media.net:8088/proxy/application_1542587655772_35129/> >> user: sarthak.sh >> 18/11/19 10:04:09 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:10 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:11 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:12 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:13 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:14 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:15 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:16 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:17 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:18 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:19 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:20 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:21 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:22 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:23 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:24 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:25 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:26 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:27 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:28 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:29 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:30 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:31 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:32 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:33 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:34 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:35 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:36 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:37 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:38 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:39 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:40 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:41 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:42 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:43 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:44 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:45 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:46 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:47 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:48 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:49 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:50 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:51 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:52 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:53 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:54 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:55 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:56 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:57 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:58 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:04:59 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:00 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:01 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:02 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:03 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:04 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:05 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:06 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:07 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:08 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:09 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:10 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:11 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:12 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:13 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:14 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:15 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:16 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:17 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:18 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:19 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:20 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:21 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:22 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:23 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:24 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:25 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:26 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:27 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:28 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:29 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:30 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:31 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:32 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:33 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:34 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:35 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:36 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:37 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:38 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:39 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:40 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:41 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:42 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:43 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:44 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:45 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:46 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:47 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:48 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:49 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:50 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:51 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:52 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:53 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:54 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:55 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:56 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:57 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:58 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> 18/11/19 10:05:59 INFO yarn.Client: Application report for >> application_1542587655772_35129 (state: ACCEPTED) >> >> at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:205) >> at >> org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:64) >> at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:111) >> at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:164) >> at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132) >> at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299) >> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407) >> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) >> at >> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:315) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> >> Any further submit to this interpreter will give null pointer exceptions due >> to the absence of an interpreter process. >> It looks like the interpreter driver process while getting submitted to >> yarn, is stuck in ACCEPTED state because of which we're not able to connect >> to the remote interpreter process. This happens even if there are resources >> on the cluster in yarn. >> Also I've tried increasing the zeppelin.interpreter.connect.timeout but that >> didn't help since the application is stuck in ACCEPTED state indefinitely >> and there are no logs available too. >> It'll be great if you can point me to something that can help. Also please >> do let me know if any configuration files are required for debugging this. >> >> >> Thanks and Regards >> >> >> Sarthak Sharma >> DevOps Engineer, Media.Net <http://media.net/> >> +918002228376 <tel:+918002228376> | sarthak...@media.net >> <mailto:sarthak...@media.net> >> <http://en-gb.facebook.com/people/Sarthak-Sharma/100006006014244> >> <http://in.linkedin.com/in/sarthaksharma96> > > > > -- > Best Regards > > Jeff Zhang