Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/20940#discussion_r179796239
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -93,6 +94,9 @@ private[spark] class EventLoggingListener(
// Visible for tests only.
private[scheduler] val logPath = getLogPath(logBaseDir, appId,
appAttemptId, compressionCodecName)
+ // Peak metric values for each executor
+ private var peakExecutorMetrics = new mutable.HashMap[String,
PeakExecutorMetrics]()
--- End diff --
you need to handle overlapping stages. I think you actually need to key on
both executor and stage, and on stage end, you only clear the metrics for that
stage.
EDIT: ok after I went through everything, I think I see how this works --
since you log on every new peak, you'll also get a logged message for the
earlier update. But as I mention below, this strategy seems like it'll result
in a lot of extra logging. Maybe I'm wrong, though, would be great to see how
much the logs grow this way.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]