otterc opened a new pull request #32140: URL: https://github.com/apache/spark/pull/32140
This is work in progress as it depends on in-progress jiras: - SPARK-32921 - SPARK-33350 ### What changes were proposed in this pull request? This is the shuffle fetch side change where executors can fetch local/remote merged shuffle data from shuffle services. This is needed for push-based shuffle - SPIP [SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602). This change introduces new messages between clients and the external shuffle service: 1. `MergedBlockMetaRequest`: The client sends this to external shuffle to get the meta information for a merged block. The response to this is one of these : - `MergedBlockMetaSuccess` : contains request id, number of chunks, and a {{ManagedBuffer}} which is a {{FileSegmentBuffer}} backed by the merged block meta file. - `RpcFailure`: this is sent back to client in case of failure. This is an existing message. 2. `FetchShuffleBlockChunks`: This is similar to `FetchShuffleBlocks` message but it is to fetch merged shuffle chunks instead of blocks. ### Why are the changes needed? These changes are needed for push-based shuffle. Refer to the SPIP in [SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602). ### Does this PR introduce _any_ user-facing change? When push-based shuffle is turned on then that will fetch merged shuffle block chunks from remote shuffle service. The client logs will indicate this. ### How was this patch tested? Added unit tests. The reference PR with the consolidated changes covering the complete implementation is also provided in [SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602). We have already verified the functionality and the improved performance as documented in the SPIP doc. Lead-authored-by: Chandni Singh [email protected] Co-authored-by: Ye Zhou [email protected] Co-authored-by: Min Shen [email protected] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
