Github user edwinalu commented on a diff in the pull request:
https://github.com/apache/spark/pull/20940#discussion_r180287725
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
}
}
- // No-op because logging every update would be overkill
- override def onExecutorMetricsUpdate(event:
SparkListenerExecutorMetricsUpdate): Unit = { }
+ /**
+ * Log if there is a new peak value for one of the memory metrics for
the given executor.
+ * Metrics are cleared out when a new stage is started in
onStageSubmitted, so this will
+ * log new peak memory metric values per executor per stage.
+ */
+ override def onExecutorMetricsUpdate(event:
SparkListenerExecutorMetricsUpdate): Unit = {
--- End diff --
The original analysis was focused on some test applications and larger
applications. Looking at a broader sample of our applications, the logging
overhead is higher, averaging about 14% per application, and 4% overall (sum of
logging for ExecutorMetricsUpdate / sum of Spark history logs). The overhead
for larger Spark history logs is mostly pretty small, but increases
significantly for smaller ones (there's one at 49%).
There's often some logging before the first stage starts, which is extra
overhead especially for smaller applications/history logs, that doesn't contain
useful information. It can also be high for the case where the stage takes a
long time to run and memory is increasing rather than reaching the peak quickly
-- logging at stage end would work better for this case.
I should also note that these numbers are for the case where only 4 longs
are recorded, and with more metrics, the overhead would be higher, both in the
size of each logged event, and the number of potential peaks, since a new peak
for any metric would be logged.
Since there will be more metrics added, and the cost is higher than
originally added, logging at stage end could be a better choice. We would lose
some information about overlapping stages, but this information wouldn't be
visible anyway with the currently planned REST APIs or web UI, which just show
the peaks for stages and executors.
For logging at stage end, we can log an ExecutorMetricsUpdate event for
each executor that has sent a heartbeat for the stage just before logging the
stage end -- this would have the peak value for each metric. This should be the
minimum amount of logging needed to have information about peak values per
stage per executor. Alternatively, the information could be added to the
StageCompleted event for more compaction, but the code would be more
complicated, with 2 paths for reading in values. Logging an event per executor
at stage end seems like a reasonable choice, not too much extra logging or too
much extra complexity.
What are your thoughts?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]