squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r268823794
##########
File path:
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
##########
@@ -267,12 +277,17 @@ private[spark] class EventLoggingListener(
override def onExecutorMetricsUpdate(event:
SparkListenerExecutorMetricsUpdate): Unit = {
if (shouldLogStageExecutorMetrics) {
- // For the active stages, record any new peak values for the memory
metrics for the executor
- event.executorUpdates.foreach { executorUpdates =>
- liveStageExecutorMetrics.values.foreach { peakExecutorMetrics =>
- val peakMetrics = peakExecutorMetrics.getOrElseUpdate(
- event.execId, new ExecutorMetrics())
- peakMetrics.compareAndUpdatePeakValues(executorUpdates)
+ event.executorUpdates.foreach { case (stageKey1, peaks) =>
+ liveStageExecutorMetrics.foreach { case (stageKey2,
metricsPerExecutor) =>
+ // If the update came from the driver, stageKey1 will be the dummy
key (-1, -1),
+ // so record those peaks for all active stages.
+ // Otherwise, record the peaks for the matching stage.
+ val driverStageKey = (-1, -1)
+ if (stageKey1 == driverStageKey || stageKey1 == stageKey2) {
+ val metrics = metricsPerExecutor.getOrElseUpdate(
+ event.execId, new ExecutorMetrics())
+ metrics.compareAndUpdatePeakValues(peaks)
Review comment:
ok I think I understand what is happening here now, sorry for my confusion
earlier.
I'm a bit concerned about this approach though ... events can get backlogged
quite a bit on the listenerbus, so the stages running now might not be the
stages running when the driver collected the metrics. I don't have a
suggestion for what we could do that would be better at the moment, though,
I'll need to think about it more.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]