Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20408#discussion_r164824439
  
    --- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -594,12 +606,24 @@ private[spark] class AppStatusListener(
     
           stage.executorSummaries.values.foreach(update(_, now))
           update(stage, now, last = true)
    +
    +      val executorIdsForStage = stage.executorSummaries.keySet
    +      executorIdsForStage.foreach { executorId =>
    +        liveExecutors.get(executorId).foreach { exec =>
    +          removeBlackListedStageFrom(exec, event.stageInfo.stageId, now)
    +        }
    +      }
         }
     
         appSummary = new AppSummary(appSummary.numCompletedJobs, 
appSummary.numCompletedStages + 1)
         kvstore.write(appSummary)
       }
     
    +  private def removeBlackListedStageFrom(exec: LiveExecutor, stageId: Int, 
now: Long) = {
    +    exec.blacklistedInStages -= stageId
    +    liveUpdate(exec, now)
    --- End diff --
    
    hmm actually I just thought of something else.  It looks like you're 
calling `liveUpdate` here for *every* executor when the stage finishes.  Say 
you have 1000 execs, a very quick stage, and no blacklisting, this is an 
expensive update for no actual change.
    
    So you should at least avoid the `liveUpdate` if `exec.blacklistedInStages` 
hasn't changed at all.  But really, I think that `LiveStage` should maintain a 
set of blacklisted executors, so you avoid calling this entirely for execs 
which aren't blacklisted.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to