wypoon commented on a change in pull request #23767: [SPARK-26329][CORE] Faster
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r299316538
##########
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##########
@@ -412,6 +432,11 @@ private[spark] class Executor(
env.mapOutputTracker.asInstanceOf[MapOutputTrackerWorker].updateEpoch(task.epoch)
}
+ val stageId = task.stageId
+ val stageAttemptId = task.stageAttemptId
+ metricsPoller.onTaskStart(taskId, stageId, stageAttemptId)
+ taskStarted = true
Review comment:
If the code here is complex and long, it is because it was already so before
my change. As a matter of fact, I added relatively few lines of code to the
method here.
The taskStarted flag actually signals two things; I only mentioned one
previously, that metricsPoller.onTaskStart has been called. It also signals
that the task has been deserialized, since that has to happen first. Therefore,
in the finally block, if taskStarted is true, it is safe to use task.stageId
and task.stageAttemptId, and we need them in order to call
metricsPoller.onTaskCompletion. It is not safe to simply call
metricsPoller.onTaskCompletion in the finally block without any check. The
taskStarted flag is a convenient check.
Thus I'd argue that it is necessary. Sure, we can replace it with a
different check but that wouldn't be simpler. In my view, the taskStarted flag
is self-explanatory and simple to understand.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]