Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/20940#discussion_r181896603
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
}
}
- // No-op because logging every update would be overkill
- override def onExecutorMetricsUpdate(event:
SparkListenerExecutorMetricsUpdate): Unit = { }
+ /**
+ * Log if there is a new peak value for one of the memory metrics for
the given executor.
+ * Metrics are cleared out when a new stage is started in
onStageSubmitted, so this will
+ * log new peak memory metric values per executor per stage.
+ */
+ override def onExecutorMetricsUpdate(event:
SparkListenerExecutorMetricsUpdate): Unit = {
--- End diff --
is your concern the size of the msg from the executors to the driver?
that certainly is valid, but I wonder if we should think a bit harder about
this if that is going to be a common concern, as I think we'll want to add more
metrics.
One possibility is for the executor to do the peak calculation itself, and
then only send an update for the metrics with a new peak. Also that would let
us just send the peak on task end events.
I'm just brainstorming at the moment, not saying it should be changed one
way or the other ... need to think about it more
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]