[GitHub] [spark] attilapiros opened a new pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

GitBox Mon, 14 Dec 2020 05:26:40 -0800


attilapiros opened a new pull request #30763:
URL: https://github.com/apache/spark/pull/30763



   This is a copy of #28618 but merged with the current master resolving all 
the merge conflicts. 
   All the credit goes to @mccheah I just would like to help out here and avoid 
his progress to be lost. 
   
   ### What changes were proposed in this pull request?
   
   Adds a `ShuffleOutputTracker` API that can be used for managing shuffle 
metadata on the driver. Accepts map output metadata returned by the map output 
writers.
   
   Requires #28616.
   
   ### Why are the changes needed?
   Part of the design as discussed in this document, and part of the wider 
effort of SPARK-25299.
   
   ### Does this PR introduce any user-facing change?
   
   Enables additional APIs for the shuffle storage plugin tree. Usage will 
become more apparent when the read side of the shuffle plugin tree is 
introduced.
   
   ### How was this patch tested?
   
   We've added a mock implementation of the shuffle plugin tree here, to prove 
that a Spark job using a different implementation of the plugin can use all of 
the plugin points for an alternative shuffle data storage solution. But we 
don't include it here, in order to minimize the diff and the code to review in 
this specific patch. See #28902.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] attilapiros opened a new pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

Reply via email to