[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on server side

GitBox Mon, 22 Mar 2021 22:02:10 -0700


dongjoon-hyun commented on a change in pull request #31936:
URL: https://github.com/apache/spark/pull/31936#discussion_r599271678




##########
File path: docs/running-on-yarn.md
##########
@@ -811,3 +830,52 @@ do the following:
   to the list of filters in the <code>spark.ui.filters</code> configuration.
 
 Be aware that the history server information may not be up-to-date with the 
application's state.
+
+# Running multiple versions of the Spark Shuffle Service
+
+In some cases it may be desirable to run multiple instances of the Spark 
Shuffle Service which are
+using different versions of Spark. This can be helpful, for example, when 
running a YARN cluster
+with a mixed workload of applications running multiple Spark versions, since a 
given version of
+the shuffle service is not always compatible with other versions of Spark. 
YARN versions since 2.9.0
+support the ability to run shuffle services within an isolated classloader
+(see [YARN-4577](https://issues.apache.org/jira/browse/YARN-4577)), meaning 
multiple Spark versions
+can coexist within a single NodeManager. The
+`yarn.nodemanager.aux-services.<service-name>.classpath` and, starting from 
YARN 2.10.2/3.1.1/3.2.0,
+`yarn.nodemanager.aux-services.<service-name>.remote-classpath` options can be 
used to configure
+this. In addition to setting up separate classpaths, it's necessary to ensure 
the two versions
+advertise to different ports. This can be achieved using the 
`spark-shuffle-site.xml` file described
+above. For example, you may have configuration like:
+
+```properties
+  yarn.nodemanager.aux-services = spark_shuffle_x,spark_shuffle_y
+  yarn.nodemanager.aux-services.spark_shuffle_x.classpath = 
/path/to/spark-x-yarn-shuffle.jar,/path/to/spark-x-config
+  yarn.nodemanager.aux-services.spark_shuffle_y.classpath = 
/path/to/spark-y-yarn-shuffle.jar,/path/to/spark-y-config
+```
+
+The two `spark-*-config` directories each contain one file, 
`spark-shuffle-site.xml`. These are XML
+files in the [Hadoop Configuration 
format](https://hadoop.apache.org/docs/r3.2.0/api/org/apache/hadoop/conf/Configuration.html)

Review comment:
       Shall we reference Apache Hadoop 3.2.2 doc instead of 3.2.0 because we 
are using Apache Spark 3.2.2?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on server side

Reply via email to