mridulm commented on code in PR #38702:
URL: https://github.com/apache/spark/pull/38702#discussion_r1035507465


##########
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala:
##########
@@ -645,8 +645,15 @@ private[spark] class AppStatusListener(
   }
 
   override def onTaskEnd(event: SparkListenerTaskEnd): Unit = {
-    // TODO: can this really happen?
-    if (event.taskInfo == null) {
+    // TODO: can taskInfo null really happen?

Review Comment:
   To answer this question, no `taskInfo` cannot be `null`  IMO - can you also 
take a look at it @Ngone51 ?
   We can remove this check and the TODO once we are sure.



##########
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala:
##########
@@ -645,8 +645,11 @@ private[spark] class AppStatusListener(
   }
 
   override def onTaskEnd(event: SparkListenerTaskEnd): Unit = {
-    // TODO: can this really happen?
-    if (event.taskInfo == null) {
+    // TODO: can taskInfo null really happen?
+    // For resubmitted tasks caused by ExecutorLost, the SparkListenerTaskEnd 
is useless and
+    // will make activeTask in stage to be negative, this will cause stage not 
be removed in
+    // liveStages, and finally cause executor not removed in deadExecutors
+    if (event.taskInfo == null ||  event.reason == Resubmitted) {

Review Comment:
   Couple of things to clarify (the pr description/comments confused me a bit).
   
   a) There was a pair of task start and task end event which were fired for 
the task (let us call it Tr)
   b) When executor which ran Tr was lost, while stage is still running, a 
`Resubmitted` is fired for Tr.
   c) Subsequently, a new task start and task end will be fired for the retry 
of Tr.
   
   
   The issue here is, (b) is associated with (a) - which was a success earlier, 
but has now been marked as a failure.
   
   
   We should not be ignoring this event, but rather deal with it properly.
   For example, update counters, number of failed tasks, etc.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to