Victsm commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r511148341



##########
File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -172,7 +178,9 @@ protected void serviceInit(Configuration conf) throws 
Exception {
       }
 
       TransportConf transportConf = new TransportConf("shuffle", new 
HadoopConfigProvider(conf));
-      blockHandler = new ExternalBlockHandler(transportConf, 
registeredExecutorFile);
+      shuffleMergeManager = new RemoteBlockPushResolver(transportConf, 
APP_BASE_RELATIVE_PATH);

Review comment:
       The only overhead is the `appsPathInfo` map maintained in 
`RemoteBlockPushResolver`, which is a per application map recording the local 
dir paths used for storing merged shuffle partition files for this application 
on a given node.
   Since it's at application level, the additional memory footprint is very 
negligible when push-based shuffle is disabled.
   
   Would it be OK if we introduce another server side config as a cluster-admin 
managed config as a master switch to enable/disable push-based shuffle?
   If disabled, we can just use the `NoOpMergedShuffleFileManager` from #29855 
here instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to