Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/20940#discussion_r180287725 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener( } } - // No-op because logging every update would be overkill - override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { } + /** + * Log if there is a new peak value for one of the memory metrics for the given executor. + * Metrics are cleared out when a new stage is started in onStageSubmitted, so this will + * log new peak memory metric values per executor per stage. + */ + override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { --- End diff -- The original analysis was focused on some test applications and larger applications. Looking at a broader sample of our applications, the logging overhead is higher, averaging about 14% per application, and 4% overall (sum of logging for ExecutorMetricsUpdate / sum of Spark history logs). The overhead for larger Spark history logs is mostly pretty small, but increases significantly for smaller ones (there's one at 49%). There's often some logging before the first stage starts, which is extra overhead especially for smaller applications/history logs, that doesn't contain useful information. It can also be high for the case where the stage takes a long time to run and memory is increasing rather than reaching the peak quickly -- logging at stage end would work better for this case. I should also note that these numbers are for the case where only 4 longs are recorded, and with more metrics, the overhead would be higher, both in the size of each logged event, and the number of potential peaks, since a new peak for any metric would be logged. Since there will be more metrics added, and the cost is higher than originally added, logging at stage end could be a better choice. We would lose some information about overlapping stages, but this information wouldn't be visible anyway with the currently planned REST APIs or web UI, which just show the peaks for stages and executors. For logging at stage end, we can log an ExecutorMetricsUpdate event for each executor that has sent a heartbeat for the stage just before logging the stage end -- this would have the peak value for each metric. This should be the minimum amount of logging needed to have information about peak values per stage per executor. Alternatively, the information could be added to the StageCompleted event for more compaction, but the code would be more complicated, with 2 paths for reading in values. Logging an event per executor at stage end seems like a reasonable choice, not too much extra logging or too much extra complexity. What are your thoughts?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org