Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21221#discussion_r195290278
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -169,6 +182,31 @@ private[spark] class EventLoggingListener(
// Events that trigger a flush
override def onStageCompleted(event: SparkListenerStageCompleted): Unit
= {
+ if (shouldLogExecutorMetricsUpdates) {
+ // clear out any previous attempts, that did not have a stage
completed event
--- End diff --
one potential issue here -- even though there is a stage completed event,
you can still have tasks running from stage attempt (when there is a fetch
failure, all existing tasks keep running). Those leftover tasks will effect
the memory usage for other tasks which run on those executors.
that said, I dunno if we can do much better here. the alternative would be
to track the task start & end events for each stage attempt.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]