squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r269243108
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
 ##########
 @@ -267,12 +277,17 @@ private[spark] class EventLoggingListener(
 
   override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
     if (shouldLogStageExecutorMetrics) {
-      // For the active stages, record any new peak values for the memory 
metrics for the executor
-      event.executorUpdates.foreach { executorUpdates =>
-        liveStageExecutorMetrics.values.foreach { peakExecutorMetrics =>
-          val peakMetrics = peakExecutorMetrics.getOrElseUpdate(
-            event.execId, new ExecutorMetrics())
-          peakMetrics.compareAndUpdatePeakValues(executorUpdates)
+      event.executorUpdates.foreach { case (stageKey1, peaks) =>
+        liveStageExecutorMetrics.foreach { case (stageKey2, 
metricsPerExecutor) =>
+          // If the update came from the driver, stageKey1 will be the dummy 
key (-1, -1),
+          // so record those peaks for all active stages.
+          // Otherwise, record the peaks for the matching stage.
+          val driverStageKey = (-1, -1)
+          if (stageKey1 == driverStageKey || stageKey1 == stageKey2) {
+            val metrics = metricsPerExecutor.getOrElseUpdate(
+              event.execId, new ExecutorMetrics())
+            metrics.compareAndUpdatePeakValues(peaks)
 
 Review comment:
   ok, nevermind all of the above -- I take back my statement about the running 
stages being out of sync.  It is true that the dag scheduler may be running 
some other stages at this point, but that doesn't matter -- this is using the 
running stages as viewed by the EventLoggingListener.  there may be a small 
discrepancy with the order in which the metric poller pushes the metrics vs the 
dagscheduler pushes the stage update, but we'd have to live with that no matter 
what.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to