dongjoon-hyun commented on a change in pull request #31936: URL: https://github.com/apache/spark/pull/31936#discussion_r599271678
########## File path: docs/running-on-yarn.md ########## @@ -811,3 +830,52 @@ do the following: to the list of filters in the <code>spark.ui.filters</code> configuration. Be aware that the history server information may not be up-to-date with the application's state. + +# Running multiple versions of the Spark Shuffle Service + +In some cases it may be desirable to run multiple instances of the Spark Shuffle Service which are +using different versions of Spark. This can be helpful, for example, when running a YARN cluster +with a mixed workload of applications running multiple Spark versions, since a given version of +the shuffle service is not always compatible with other versions of Spark. YARN versions since 2.9.0 +support the ability to run shuffle services within an isolated classloader +(see [YARN-4577](https://issues.apache.org/jira/browse/YARN-4577)), meaning multiple Spark versions +can coexist within a single NodeManager. The +`yarn.nodemanager.aux-services.<service-name>.classpath` and, starting from YARN 2.10.2/3.1.1/3.2.0, +`yarn.nodemanager.aux-services.<service-name>.remote-classpath` options can be used to configure +this. In addition to setting up separate classpaths, it's necessary to ensure the two versions +advertise to different ports. This can be achieved using the `spark-shuffle-site.xml` file described +above. For example, you may have configuration like: + +```properties + yarn.nodemanager.aux-services = spark_shuffle_x,spark_shuffle_y + yarn.nodemanager.aux-services.spark_shuffle_x.classpath = /path/to/spark-x-yarn-shuffle.jar,/path/to/spark-x-config + yarn.nodemanager.aux-services.spark_shuffle_y.classpath = /path/to/spark-y-yarn-shuffle.jar,/path/to/spark-y-config +``` + +The two `spark-*-config` directories each contain one file, `spark-shuffle-site.xml`. These are XML +files in the [Hadoop Configuration format](https://hadoop.apache.org/docs/r3.2.0/api/org/apache/hadoop/conf/Configuration.html) Review comment: Shall we reference Apache Hadoop 3.2.2 doc instead of 3.2.0 because we are using Apache Spark 3.2.2? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
