[GitHub] [spark] Ngone51 commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

GitBox Wed, 09 Oct 2019 08:46:51 -0700

Ngone51 commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333091552


 ##########
 File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
 ##########
 @@ -103,6 +104,81 @@ private[spark] class AppStatusListener(
     }
   }
 
+  // visible for tests
+  private[spark] def recoverLiveEntities(): Unit = {
+    if (!live) {
+      kvstore.view(classOf[JobDataWrapper])
+        .asScala.filter(_.info.status == JobExecutionStatus.RUNNING)
+        .map(_.toLiveJob).foreach(job => liveJobs.put(job.jobId, job))
+
+      kvstore.view(classOf[StageDataWrapper]).asScala
+        .filter { stageData =>
+          stageData.info.status == v1.StageStatus.PENDING ||
+            stageData.info.status == v1.StageStatus.ACTIVE
+        }
+        .map { stageData =>
+          val stageId = stageData.info.stageId
+          val jobs = liveJobs.values.filter(_.stageIds.contains(stageId)).toSeq
+          stageData.toLiveStage(jobs)
+        }.foreach { stage =>
+        val stageId = stage.info.stageId
+        val stageAttempt = stage.info.attemptNumber()
+        liveStages.put((stageId, stageAttempt), stage)
+
+        kvstore.view(classOf[ExecutorStageSummaryWrapper])
+          .index("stage")
+          .first(Array(stageId, stageAttempt))
+          .last(Array(stageId, stageAttempt))
+          .asScala
+          .map(_.toLiveExecutorStageSummary)
+          .foreach { esummary =>
+            stage.executorSummaries.put(esummary.executorId, esummary)
+            if (esummary.isBlacklisted) {
+              stage.blackListedExecutors += esummary.executorId
+              liveExecutors(esummary.executorId).isBlacklisted = true
+              liveExecutors(esummary.executorId).blacklistedInStages += stageId
+            }
+          }
+
+
+        kvstore.view(classOf[TaskDataWrapper])
+          .parent(Array(stageId, stageAttempt))
+          .index(TaskIndexNames.STATUS)
+          .first(TaskState.RUNNING.toString)
+          .last(TaskState.RUNNING.toString)
+          .closeableIterator().asScala
+          .map(_.toLiveTask)
+          .foreach { task =>
+            liveTasks.put(task.info.taskId, task)
+            stage.activeTasksPerExecutor(task.info.executorId) += 1
+          }
+        
stage.savedTasks.addAndGet(kvstore.count(classOf[TaskDataWrapper]).intValue())
+      }
+      
kvstore.view(classOf[ExecutorSummaryWrapper]).asScala.filter(_.info.isActive)
 
 Review comment:
   > We may want to restore deadExecutors for the same. (isActive == false)
   
   Actually, we never write dead executors into KVStore in a live(true) 
`AppStatusListener`. That is, because, in `AppStatusListener`, we can see the 
place where dead executors might be wrote into KVStore:
   
   
https://github.com/apache/spark/blob/d2f21b019909e66bf49ad764b851b4a65c2438f8/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala#L857-L869
   
   However, when you notice the event `SparkListenerStageExecutorMetrics`, you 
will find that this event is only generated from `EventLoggingListener` and 
wrote into event log file. And the event would be only used within SHS's 
replay. That means, method `onStageExecutorMetrics` would be only called in a 
non-live `AppStatusListener`. Samely, we don't have a chance to call  
`onStageExecutorMetrics ` in a live `AppStatusListener`. Due to this, we don't 
have a chance to write dead executors into KVStore.
   
   
   ==================================================
   
   Wait, wait. I just remember that in SPARK-28594, we'll do incremental replay 
in SHS side, which can be possible to write dead executors to KVStore. 
   
   Let me recover dead executors, too. And I decide to leave my original 
comment there since I think this can be a tricky part and I want to let you 
know my thoughts as clear as possible.
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

Reply via email to