boy-uber opened a new pull request #30004:
URL: https://github.com/apache/spark/pull/30004


   ### What changes were proposed in this pull request?
   Add generic metadata in MapStatus class to support custom shuffle manager. 
Also add a new method to retrieve all map output statuses and their metadata. 
See Jira: https://issues.apache.org/jira/projects/SPARK/issues/SPARK-33114
   
   ### Why are the changes needed?
   Current MapStatus class is tightly bound with local (sort merge) shuffle 
which uses BlockManagerId to store the shuffle data location. It could not 
support other custom shuffle manager implementation. 
   
   For example, when we implement Remote Shuffle Service, we want to put remote 
shuffle server information into MapStatus so reducer could fetch that 
information and figure out where to fetch data. The added MapStatus.metadata 
field could store such information.
   
   If people implement other shuffle manager, they could also store their 
related information into this metadata field.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added unit test
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to