attilapiros opened a new pull request #30763: URL: https://github.com/apache/spark/pull/30763
This is a copy of #28618 but merged with the current master resolving all the merge conflicts. All the credit goes to @mccheah I just would like to help out here and avoid his progress to be lost. ### What changes were proposed in this pull request? Adds a `ShuffleOutputTracker` API that can be used for managing shuffle metadata on the driver. Accepts map output metadata returned by the map output writers. Requires #28616. ### Why are the changes needed? Part of the design as discussed in this document, and part of the wider effort of SPARK-25299. ### Does this PR introduce any user-facing change? Enables additional APIs for the shuffle storage plugin tree. Usage will become more apparent when the read side of the shuffle plugin tree is introduced. ### How was this patch tested? We've added a mock implementation of the shuffle plugin tree here, to prove that a Spark job using a different implementation of the plugin can use all of the plugin points for an alternative shuffle data storage solution. But we don't include it here, in order to minimize the diff and the code to review in this specific patch. See #28902. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
