attilapiros commented on issue #24499: [SPARK-27677][Core] Serve local disk persisted blocks by the external service after releasing executor by dynamic allocation URL: https://github.com/apache/spark/pull/24499#issuecomment-493194196 In the 74f3461 commit I have extended the external shuffle service with the capability to remove disk persisted RDD blocks. There is two new `BlockTransferMessage`s are introduced: - `RemoveBlocks`: containing the removable BlockIDs and the executorID (because of the message encoding possibilities I have taken this solution over an array of executorId and blockIDs pairs which would generate a bit less network traffic, there would be only one request for all the released executors which was running on the same host) - `BlocksRemoved`: a reply message containing the number of removed blocks (successful delete) I think the critical change in this commit is definitely `org.apache.spark.network.shuffle.ExternalShuffleClient#removeBlocks` as an `RpcEndpointRef` would provide what we need in here but the current solution seams to me a simpler solution. @vanzin the config you asked is not introduced yet, but I already thought about its name, what about `spark.shuffle.service.fetch.rdd.enabled`?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
