otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r510537792
##########
File path:
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -172,7 +178,9 @@ protected void serviceInit(Configuration conf) throws
Exception {
}
TransportConf transportConf = new TransportConf("shuffle", new
HadoopConfigProvider(conf));
- blockHandler = new ExternalBlockHandler(transportConf,
registeredExecutorFile);
+ shuffleMergeManager = new RemoteBlockPushResolver(transportConf,
APP_BASE_RELATIVE_PATH);
Review comment:
On the client side, we have added a configuration
`spark.shuffle.push.enabled`. This is per application and by default it is
going to be `false`. An app would decide whether it wants to run with push
enabled or not.
If we make this configurable on the server-side, how would we enforce that
an application doesn't try push-based shuffle when the servers don't support it?
Also, I think changing server side configurations is a bit of hassle. We run
Spark over Yarn and for every server side change we need restart all the
NodeManagers.
Do you have any concerns that the `shuffleMergeManager` interferes with
regular shuffle?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]