tgravescs commented on a change in pull request #31936:
URL: https://github.com/apache/spark/pull/31936#discussion_r600477876
##########
File path:
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -139,6 +164,13 @@
private DB db;
public YarnShuffleService() {
+ // The name of the auxiliary service configured within the NodeManager
+ // (`yarn.nodemanager.aux-services`) is treated as the source-of-truth, so
this one can be
Review comment:
for yarn 2.9+
##########
File path: docs/running-on-yarn.md
##########
@@ -811,3 +831,52 @@ do the following:
to the list of filters in the <code>spark.ui.filters</code> configuration.
Be aware that the history server information may not be up-to-date with the
application's state.
+
+# Running multiple versions of the Spark Shuffle Service
+
+In some cases it may be desirable to run multiple instances of the Spark
Shuffle Service which are
+using different versions of Spark. This can be helpful, for example, when
running a YARN cluster
+with a mixed workload of applications running multiple Spark versions, since a
given version of
+the shuffle service is not always compatible with other versions of Spark.
YARN versions since 2.9.0
+support the ability to run shuffle services within an isolated classloader
Review comment:
I think we should be more explicit here and say requires Yarn 2.9+
##########
File path: docs/running-on-yarn.md
##########
@@ -761,8 +761,27 @@ The following extra configuration options are available
when the shuffle service
NodeManagers where the Spark Shuffle Service is not running.
</td>
</tr>
+<tr>
+ <td><code>spark.yarn.shuffle.service.metrics.namespace</code></td>
+ <td><code>sparkShuffleService</code></td>
+ <td>
+ The namespace to use when emitting shuffle service metrics into Hadoop
metrics2 system of the
+ NodeManager.
Review comment:
it looks like the name referenced by the node manager works with the
Hadoop 2.9+ custom class loader, but I assume with Hadoop 2.7 it requires the
spark_shuffle name ? hence the spark.shuffle.service.name won't work unless
you have recompiled the code and manually changed it.
Perhaps we just need to be more explicit in the config
spark.shuffle.service.name that either references the section running multiple
versions of the Spark Shuffle Service or explicitly states supported in YARN
2.9+. I assume this config with metrics doesn't matter as far as Hadoop
version.
Also did we explicitly test with Hadoop 2.7 and the case @dongjoon-hyun
brings up?
##########
File path:
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -75,6 +76,15 @@
* is because an application running on the same Yarn cluster may choose to
not use the external
* shuffle service, in which case its setting of `spark.authenticate` should
be independent of
* the service's.
+ *
+ * The shuffle service will produce metrics via the YARN NodeManager's {@code
metrics2} system
+ * under a namespace specified by the {@value
SPARK_SHUFFLE_SERVICE_METRICS_NAMESPACE_KEY} config.
+ *
+ * By default, all configurations for the shuffle service will be taken
directly from the
+ * Hadoop {@link Configuration} passed by the YARN NodeManager. It is also
possible to configure
+ * the shuffle service by placing a resource named
+ * {@value SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME} into the classpath,
which should be an
Review comment:
again add comment about with YARN 2.9+
##########
File path:
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnShuffleAlternateNameConfigSuite.scala
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.net.URLClassLoader
+
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+
+import org.apache.spark._
+import org.apache.spark.internal.config._
+import org.apache.spark.network.yarn.{YarnShuffleService, YarnTestAccessor}
+import org.apache.spark.tags.ExtendedYarnTest
+
+/**
+ * SPARK-34828: Integration test for the external shuffle service with an
alternate name and
+ * configs (by using a configuration overlay)
+ */
+@ExtendedYarnTest
+class YarnShuffleAlternateNameConfigSuite extends YarnShuffleIntegrationSuite {
+
+ private[this] val shuffleServiceName = "custom_shuffle_service_name"
+
+ override def newYarnConfig(): YarnConfiguration = {
+ val yarnConfig = super.newYarnConfig()
+ yarnConfig.set(YarnConfiguration.NM_AUX_SERVICES, shuffleServiceName)
+
yarnConfig.set(YarnConfiguration.NM_AUX_SERVICE_FMT.format(shuffleServiceName),
+ classOf[YarnShuffleService].getCanonicalName)
+ val overlayConf = new YarnConfiguration()
+ // Enable authentication in the base NodeManager conf but not in the
client. This would break
+ // shuffle, unless the shuffle service conf overlay overrides to turn off
authentication.
+ overlayConf.setBoolean(NETWORK_AUTH_ENABLED.key, true)
+ // Add the authentication conf to a separate config object used as an
overlay rather than
+ // setting it directly. This is necessary because a config overlay will
override previous
+ // config overlays, but not configs which were set directly on the config
object.
+ yarnConfig.addResource(overlayConf)
+ yarnConfig
+ }
+
+ override protected def extraSparkConf(): Map[String, String] =
+ super.extraSparkConf() ++ Map(SHUFFLE_SERVICE_NAME.key ->
shuffleServiceName)
Review comment:
how does this work with Hadoop 2.7 tests? am I mistaken on how 2.7 is
using the name?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]