gczsjdy opened a new pull request #24526: [SPARK-27603][CORE]Make the 
BlockTransferService for shuffle fetch pluggable
URL: https://github.com/apache/spark/pull/24526
 
 
   ## What changes were proposed in this pull request?
   
   Shuffle manager is pluggable in Spark, however, some service closely related 
to the shuffle functionality is constrained to 1 or 2 implementations. One 
example is `NettyBlockTransferService`, it is used in BlockManager to fetch 
remote bytes, and to fetch shuffle data in non-external shuffle. The 2 
functionalities are coupled together. Actually the latter functionality to 
fetch shuffle data should be pluggable/extensible.
   
   A custom Spark shuffle manager may need the set of service, including the 
RPC servers, clients and context that `NettyBlockTransferService` has 
constructed(constructing a new set of connections between executors is 
redundant), but also a new `NettyBlockTransferService` with custom need. For 
example, a remote shuffle manager under disaggregated compute and storage 
architecture may only need the service to transfer index files from other 
executors(for cache purpose) through Netty, but read data files directly from 
the globally-accessible storage.
   
   We propose to make this transfer service for shuffle pluggable, also make 
some fields in `NettyBlockTransferService` wider accessible for developers to 
extend.
   
   ## How was this patch tested?
   
   Existing tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to