otterc opened a new pull request #32140:
URL: https://github.com/apache/spark/pull/32140


   This is work in progress as it depends on in-progress jiras:
   - SPARK-32921
   - SPARK-33350
   
   ### What changes were proposed in this pull request?
   This is the shuffle fetch side change where executors can fetch local/remote 
merged shuffle data from shuffle services. This is needed for push-based 
shuffle - SPIP [SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602).
   
   This change introduces new messages between clients and the external shuffle 
service:
   
   1. `MergedBlockMetaRequest`: The client sends this to external shuffle to 
get the meta information for a merged block. The response to this is one of 
these :
     - `MergedBlockMetaSuccess` : contains request id, number of chunks, and a 
{{ManagedBuffer}} which is a {{FileSegmentBuffer}} backed by the merged block 
meta file.
     - `RpcFailure`: this is sent back to client in case of failure. This is an 
existing message.
   
   2. `FetchShuffleBlockChunks`: This is similar to `FetchShuffleBlocks` 
message but it is to fetch merged shuffle chunks instead of blocks.
   
   ### Why are the changes needed?
   These changes are needed for push-based shuffle. Refer to the SPIP in 
[SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602).
   
   ### Does this PR introduce _any_ user-facing change?
   When push-based shuffle is turned on then that will fetch merged shuffle 
block chunks from remote shuffle service. The client logs will indicate this.
   
   ### How was this patch tested?
   Added unit tests.
   The reference PR with the consolidated changes covering the complete 
implementation is also provided in 
[SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602).
   We have already verified the functionality and the improved performance as 
documented in the SPIP doc.
   
   Lead-authored-by: Chandni Singh [email protected]
   Co-authored-by: Ye Zhou [email protected]
   Co-authored-by: Min Shen [email protected]
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to