Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/20940#discussion_r180426692
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
}
}
- // No-op because logging every update would be overkill
- override def onExecutorMetricsUpdate(event:
SparkListenerExecutorMetricsUpdate): Unit = { }
+ /**
+ * Log if there is a new peak value for one of the memory metrics for
the given executor.
+ * Metrics are cleared out when a new stage is started in
onStageSubmitted, so this will
+ * log new peak memory metric values per executor per stage.
+ */
+ override def onExecutorMetricsUpdate(event:
SparkListenerExecutorMetricsUpdate): Unit = {
--- End diff --
yeah logging an event per executor at stage end seems good to me. It would
be great if we could see how much that version affects log size as well, if you
can get those metrics.
also these tradeoffs should go into the design doc, its harder to find
comments from a PR after this feature has been merged. For now, it would also
be nice if you could post a version that everyone can comment on, eg. a google
doc.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]