vanzin opened a new pull request #26670: [SPARK-30033][core] Manage shuffle IO plugins using Spark's plugin system. URL: https://github.com/apache/spark/pull/26670 SPARK-25299 is introducing a new plugin interface for shuffle IO; currently, parts of that API provide lifecycle methods that are already covered by the plugin API that was added in SPARK-29396. This change makes some modifications so that: - The driver and executor components of the shuffle plugin extend their respective counterparts in the generic plugin API. - The shuffle IO plugin is managed by the same code that manages other generic plugins. This simplifies and reuses similar code that exists in both implementations, and also provides more functionality to shuffle plugins: not only do they have more contextual information (without having to query APIs like SparkEnv) but they also have access to other functionality in the plugin API that would otherwise require touching internal Spark APIs. There is a small change to the generic plugin API to avoid registering an RPC endpoint and starting threads when not needed; plugins now must explicitly say they want to handle RPC messages for the endpoint to be created. This is done because the default shuffle plugin is now loaded by the plugin system, and does not need the RPC functionality. (This API hasn't been released yet so it's ok to make the change.) The only downside is that initialization of the SortShuffleManager in executors is a bit weird, because of the order in which things are initialized: the shuffle manager is initialized by SparkEnv, and plugin initialization happens after that. In any case, all initialization is done before any tasks are allowed to run.. Currently, the shuffle plugin is always loaded, regardless of whether the sort shuffle manager is being used; this was already the case in the driver, but now is also the case in the executors. It shouldn't be hard to fix that if needed. Tested with existing and updated unit tests.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
