utkarsh39 commented on PR #44321: URL: https://github.com/apache/spark/pull/44321#issuecomment-1864719908
**Proposal To Gain Consensus** The PR alleviates memory pressure on the driver although at the cost of introducing a breaking change as identified by @JoshRosen in https://github.com/apache/spark/pull/44321#pullrequestreview-1785137821. I propose that we disable the feature by default and introduce a breaking change wherein the `TaskInfo.accumulables()` are empty for `Resubmitted` tasks upon the loss of an executor? The behavior change would be to return an **empty** `Accumulables` as opposed to returning `Accumulables` of a earlier successful task attempt today. When this change is enabled, the behavior change will affect the following consumers: 1. `EventLoggingListener` where task accumulables are serialized to JSON upon task completion ([code link](https://github.com/apache/spark/blob/aa1ff3789e492545b07d84ac095fc4c39f7446c6/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L159)). 2. Custom Spark Listeners installed by Spark users What do the reviewers think of the proposal? Note that the current design in the PR does not implement this proposal. Currently, accessing the empty accumulables would result in a crash. I will refactor the change if agree upon this proposal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
