Victsm commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r511148341
##########
File path:
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -172,7 +178,9 @@ protected void serviceInit(Configuration conf) throws
Exception {
}
TransportConf transportConf = new TransportConf("shuffle", new
HadoopConfigProvider(conf));
- blockHandler = new ExternalBlockHandler(transportConf,
registeredExecutorFile);
+ shuffleMergeManager = new RemoteBlockPushResolver(transportConf,
APP_BASE_RELATIVE_PATH);
Review comment:
The only overhead is the `appsPathInfo` map maintained in
`RemoteBlockPushResolver`, which is a per application map recording the local
dir paths used for storing merged shuffle partition files for this application
on a given node.
Since it's at application level, the additional memory footprint is very
negligible when push-based shuffle is disabled.
Would it be OK if we introduce another server side config as a cluster-admin
managed config as a master switch to enable/disable push-based shuffle?
If disabled, we can just use the `NoOpMergedShuffleFileManager` from #29855
here instead.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]