Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21221#discussion_r198611581
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -169,6 +181,28 @@ private[spark] class EventLoggingListener(
// Events that trigger a flush
override def onStageCompleted(event: SparkListenerStageCompleted): Unit
= {
+ if (shouldLogExecutorMetricsUpdates) {
+ // clear out any previous attempts, that did not have a stage
completed event
+ val prevAttemptId = event.stageInfo.attemptNumber() - 1
+ for (attemptId <- 0 to prevAttemptId) {
+ liveStageExecutorMetrics.remove((event.stageInfo.stageId,
attemptId))
+ }
+
+ // log the peak executor metrics for the stage, for each live
executor,
+ // whether or not the executor is running tasks for the stage
+ val executorMap = liveStageExecutorMetrics.remove(
+ (event.stageInfo.stageId, event.stageInfo.attemptNumber()))
+ executorMap.foreach {
+ executorEntry => {
+ for ((executorId, peakExecutorMetrics) <- executorEntry) {
--- End diff --
couple of minor things -- I wouldn't use `executorMap` for the `Option` and
then `executorEntry` for the Map. Also, we tend to prefer using `foreach` over
scala's `for` loop, the one exception is that it can clean up extra nesting
when you've got a bunch of loops (which you actually do here).
So I'd go with
```scala
val executorOpt = liveStageExecutorMetrics.remove(
(event.stageInfo.stageId, event.stageInfo.attemptNumber()))
executorOpt.foreach { execMap =>
execMap.foreach { case (executorId, peakExecutorMetrics) =>
...
}
}
```
or if you want to use the for loop, use it around everything:
```scala
val execOpt = liveStageExecutorMetrics.remove(
(event.stageInfo.stageId, event.stageInfo.attemptNumber()))
for {
execMap <- execOpt
(executorId, peakExecutorMetrics) <- execMap
} {
...
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]