[ https://issues.apache.org/jira/browse/SPARK-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875945#comment-16875945 ]
liupengcheng edited comment on SPARK-27802 at 7/2/19 6:23 AM: -------------------------------------------------------------- [~shahid] yes, but I checked master branch, I found that these logic was removed in 3.0.0 and replaced with some javascript scripts, so I'am not sure whether we can fix it only in versions prior to 3.0.0? I haven't looked into the code of 3.0.0, so I am not sure whether this issue still exists, that's why I haven't put an PR for it. you can follow these steps to reproduce the issue: # set spark.ui.retainedDeadExecutors=0 and set spark.ui.retainedStages=1000 # set spark.dynamicAllocation.enabled=true # run a spark app, and wait for complete, and let executors idle. # check the stage UI. was (Author: liupengcheng): [~shahid] yes, but I checked master branch, I found that these logic was removed in 3.0.0, so I'am not sure whether we can fix it only in 2.3? that's why I haven't put an PR for it. you can follow these steps to reproduce the issue: # set spark.ui.retainedDeadExecutors=0 and set spark.ui.retainedStages=1000 # set spark.dynamicAllocation.enabled=true # run a spark app, and wait for complete, and let executors idle. # check the stage UI. > SparkUI throws NoSuchElementException when inconsistency appears between > `ExecutorStageSummaryWrapper`s and `ExecutorSummaryWrapper`s > ------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-27802 > URL: https://issues.apache.org/jira/browse/SPARK-27802 > Project: Spark > Issue Type: Bug > Components: Web UI > Affects Versions: 2.3.2 > Reporter: liupengcheng > Priority: Major > > Recently, we hit this issue when testing spark2.3. It report the following > error messages when clicking on the stage UI link. > We add more logs to print the executorId(here is 10) to debug, and finally > find out that it's caused by the inconsistency between the list of > `ExecutorStageSummaryWrapper` and the `ExecutorSummaryWrapper` in the > KVStore. The number of deadExecutors may exceeded threshold and being removed > from list of `ExecutorSummaryWrapper`, however, it may still be kept in the > list of `ExecutorStageSummaryWrapper` in the store. > {code:java} > HTTP ERROR 500 > Problem accessing /stages/stage/. Reason: > Server Error > Caused by: > java.util.NoSuchElementException: 10 > at > org.apache.spark.util.kvstore.InMemoryStore.read(InMemoryStore.java:83) > at > org.apache.spark.status.ElementTrackingStore.read(ElementTrackingStore.scala:95) > at > org.apache.spark.status.AppStatusStore.executorSummary(AppStatusStore.scala:70) > at > org.apache.spark.ui.jobs.ExecutorTable$$anonfun$createExecutorTable$2.apply(ExecutorTable.scala:99) > at > org.apache.spark.ui.jobs.ExecutorTable$$anonfun$createExecutorTable$2.apply(ExecutorTable.scala:92) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.ui.jobs.ExecutorTable.createExecutorTable(ExecutorTable.scala:92) > at > org.apache.spark.ui.jobs.ExecutorTable.toNodeSeq(ExecutorTable.scala:75) > at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:478) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) > at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:166) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:539) > at > org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at > org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org