mridulm commented on code in PR #44321:
URL: https://github.com/apache/spark/pull/44321#discussion_r1430656022
##########
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:
##########
@@ -1086,7 +1133,7 @@ private[spark] class TaskSetManager(
addPendingTask(index)
// Tell the DAGScheduler that this task was resubmitted so that it
doesn't think our
// stage finishes when a total of tasks.size tasks finish.
- sched.dagScheduler.taskEnded(
+ emptyTaskInfoAccumulablesAndNotifyDagScheduler(tid,
tasks(index), Resubmitted, null, Seq.empty, Array.empty, info)
Review Comment:
Yes, what @JoshRosen [detailed for
executorLost](https://github.com/apache/spark/pull/44321#pullrequestreview-1785137821)
is what I am referring to - had missed that comment, thanks for the analysis
Josh !
Reducing memory utilization is definitely something we should aspire for,
but IMO not at the cost of breaking interfaces and existing user code - in this
case, `Resubmitted` has existed for 8+ years; and has been exposed to
developer's.
How commonly this is triggered depends on stability of the cluster infra -
which might be different for different users/deployments :-)
In this specific case, unfortunately it impacts not just end user code
(which we dont know is using it: and I have seen fairly involved usage of these
api's, including writing some of my own in the past) - but also our spark UI -
for example `AppStatusListener.onTaskEnd` when `Resubmitted` is fired.
If we want to relook at the behavior of how `Resubmitted` is handled in
Spark, we should do that decoupled from a performance discussion - it is
possible that some of this has evolved in "interesting ways" since initially
designed, and requires to be revisited (why should users care, etc are
discussions to be had there).
For this PR, which is to improve memory utilization, it should not have
unintended side effects IMO.
As I mentioned before, at a minimum, this should not become default.
Once we relook/redesign `Resubmitted`, we can flip it - if required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]