Github user edwinalu commented on a diff in the pull request:
https://github.com/apache/spark/pull/20940#discussion_r179978102
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
}
}
- // No-op because logging every update would be overkill
- override def onExecutorMetricsUpdate(event:
SparkListenerExecutorMetricsUpdate): Unit = { }
+ /**
+ * Log if there is a new peak value for one of the memory metrics for
the given executor.
+ * Metrics are cleared out when a new stage is started in
onStageSubmitted, so this will
+ * log new peak memory metric values per executor per stage.
+ */
+ override def onExecutorMetricsUpdate(event:
SparkListenerExecutorMetricsUpdate): Unit = {
--- End diff --
For a longer running stage, once it ramps up, hopefully there wouldn't be a
lot of new peak values. Looking at a subset of our applications, the extra
logging overhead has mostly been between 0.25% to 1%, but it can be 8%.
By logging each peak value at the time they occur (and reinitializing when
a stage starts), it's possible to tell which stages are active at the time, and
it would potentially be possible to graph these changes on a timeline -- this
information wouldn't be available if the metrics are only logged at stage end,
and the times are lost.
Logging at stage end would limit the amount of extra logging. If we add
more metrics (such as for offheap), then there could be more new peaks and more
extra logging with the current approach. Excess logging is a concern, and I can
move to stage end if the overhead is too much.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]