attilapiros opened a new pull request #24554: [SPARK-27622][Core] Avoiding the 
network when block manager fetches disk persisted RDD blocks from the same host
URL: https://github.com/apache/spark/pull/24554
 
 
   ## What changes were proposed in this pull request?
   
   Before this PR during fetching a disk persisted RDD block the network was 
always used to get the requested block content even when both the source and 
fetcher executor was running on the same host.
   
   The idea to access another executor local disk files by directly reading the 
disk comes from the external shuffle service where the local dirs are stored 
for each executor (block manager).
   
   To make this possible the following changes are done:
   - `RegisterBlockManager` message is extended with the `localDirs` which is 
stored by the block manager master for each block manager as a new property of 
the `BlockManagerInfo`
   - `GetLocationsAndStatus` is extended with the requester host
   - `BlockLocationsAndStatus` (the reply for `GetLocationsAndStatus` message) 
is extended with the an option of local directories, which is filled with a 
local directories of a same host executor (if there is any, otherwise None is 
used). This is where the block content can be read from.
   
   Shuffle blocks are out of scope of this PR: there will be a separate PR 
opened for that (for another Jira issue). 
   
   ## How was this patch tested?
   
   With a new unit test in `BlockManagerSuite`. See the the test prefixed by 
"SPARK-27622: avoid the network when block requested from same host".

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to