[GitHub] [spark] tgravescs commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on server side

GitBox Wed, 24 Mar 2021 06:43:33 -0700


tgravescs commented on a change in pull request #31936:
URL: https://github.com/apache/spark/pull/31936#discussion_r600477876




##########
File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -139,6 +164,13 @@
   private DB db;
 
   public YarnShuffleService() {
+    // The name of the auxiliary service configured within the NodeManager
+    // (`yarn.nodemanager.aux-services`) is treated as the source-of-truth, so 
this one can be

Review comment:
       for yarn 2.9+

##########
File path: docs/running-on-yarn.md
##########
@@ -811,3 +831,52 @@ do the following:
   to the list of filters in the <code>spark.ui.filters</code> configuration.
 
 Be aware that the history server information may not be up-to-date with the 
application's state.
+
+# Running multiple versions of the Spark Shuffle Service
+
+In some cases it may be desirable to run multiple instances of the Spark 
Shuffle Service which are
+using different versions of Spark. This can be helpful, for example, when 
running a YARN cluster
+with a mixed workload of applications running multiple Spark versions, since a 
given version of
+the shuffle service is not always compatible with other versions of Spark. 
YARN versions since 2.9.0
+support the ability to run shuffle services within an isolated classloader

Review comment:
       I think we should be more explicit here and say requires Yarn 2.9+

##########
File path: docs/running-on-yarn.md
##########
@@ -761,8 +761,27 @@ The following extra configuration options are available 
when the shuffle service
     NodeManagers where the Spark Shuffle Service is not running.
   </td>
 </tr>
+<tr>
+  <td><code>spark.yarn.shuffle.service.metrics.namespace</code></td>
+  <td><code>sparkShuffleService</code></td>
+  <td>
+    The namespace to use when emitting shuffle service metrics into Hadoop 
metrics2 system of the
+    NodeManager.

Review comment:
       it looks like the name referenced by the node manager works with the 
Hadoop 2.9+ custom class loader, but I assume with Hadoop 2.7 it requires the 
spark_shuffle name ?  hence the spark.shuffle.service.name won't work unless 
you have recompiled the code and manually changed it.
   Perhaps we just need to be more explicit in the config 
spark.shuffle.service.name that either references  the section running multiple 
versions of the Spark Shuffle Service or explicitly states supported in YARN 
2.9+.     I assume this config with metrics doesn't matter as far as Hadoop 
version.
   Also did we explicitly test with Hadoop 2.7 and the case @dongjoon-hyun 
brings up?

##########
File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -75,6 +76,15 @@
  * is because an application running on the same Yarn cluster may choose to 
not use the external
  * shuffle service, in which case its setting of `spark.authenticate` should 
be independent of
  * the service's.
+ *
+ * The shuffle service will produce metrics via the YARN NodeManager's {@code 
metrics2} system
+ * under a namespace specified by the {@value 
SPARK_SHUFFLE_SERVICE_METRICS_NAMESPACE_KEY} config.
+ *
+ * By default, all configurations for the shuffle service will be taken 
directly from the
+ * Hadoop {@link Configuration} passed by the YARN NodeManager. It is also 
possible to configure
+ * the shuffle service by placing a resource named
+ * {@value SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME} into the classpath, 
which should be an

Review comment:
       again add comment about with YARN 2.9+

##########
File path: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnShuffleAlternateNameConfigSuite.scala
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.net.URLClassLoader
+
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+
+import org.apache.spark._
+import org.apache.spark.internal.config._
+import org.apache.spark.network.yarn.{YarnShuffleService, YarnTestAccessor}
+import org.apache.spark.tags.ExtendedYarnTest
+
+/**
+ * SPARK-34828: Integration test for the external shuffle service with an 
alternate name and
+ * configs (by using a configuration overlay)
+ */
+@ExtendedYarnTest
+class YarnShuffleAlternateNameConfigSuite extends YarnShuffleIntegrationSuite {
+
+  private[this] val shuffleServiceName = "custom_shuffle_service_name"
+
+  override def newYarnConfig(): YarnConfiguration = {
+    val yarnConfig = super.newYarnConfig()
+    yarnConfig.set(YarnConfiguration.NM_AUX_SERVICES, shuffleServiceName)
+    
yarnConfig.set(YarnConfiguration.NM_AUX_SERVICE_FMT.format(shuffleServiceName),
+      classOf[YarnShuffleService].getCanonicalName)
+    val overlayConf = new YarnConfiguration()
+    // Enable authentication in the base NodeManager conf but not in the 
client. This would break
+    // shuffle, unless the shuffle service conf overlay overrides to turn off 
authentication.
+    overlayConf.setBoolean(NETWORK_AUTH_ENABLED.key, true)
+    // Add the authentication conf to a separate config object used as an 
overlay rather than
+    // setting it directly. This is necessary because a config overlay will 
override previous
+    // config overlays, but not configs which were set directly on the config 
object.
+    yarnConfig.addResource(overlayConf)
+    yarnConfig
+  }
+
+  override protected def extraSparkConf(): Map[String, String] =
+    super.extraSparkConf() ++ Map(SHUFFLE_SERVICE_NAME.key -> 
shuffleServiceName)

Review comment:
       how does this work with Hadoop 2.7 tests?  am I mistaken on how 2.7 is 
using the name?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tgravescs commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on server side

Reply via email to