LantaoJin commented on a change in pull request #23951: 
[SPARK-27038][CORE][YARN] Re-implement RackResolver to reduce resolving time
URL: https://github.com/apache/spark/pull/23951#discussion_r264071563
 
 

 ##########
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/SparkRackResolver.scala
 ##########
 @@ -37,4 +46,77 @@ private[yarn] class SparkRackResolver {
     RackResolver.resolve(conf, hostName).getNetworkLocation()
   }
 
+  /**
+   * Added in SPARK-27038.
+   * This should be changed to `RackResolver.resolve(conf, hostNames)`
+   * in hadoop releases with YARN-9332.
+   */
+  def resolve(conf: Configuration, hostNames: List[String]): List[Node] = {
+    SparkRackResolver.coreResolve(conf, hostNames)
+  }
+}
+
+/**
+ * Utility to resolve the rack for hosts in an efficient manner.
+ * It will cache the rack for individual hosts to avoid
+ * repeatedly performing the same expensive lookup.
+ *
+ * Its logic refers [[org.apache.hadoop.yarn.util.RackResolver]] and enhanced.
+ * This will be unnecessary in hadoop releases with YARN-9332.
+ * With that, we could just directly use 
[[org.apache.hadoop.yarn.util.RackResolver]].
+ * In the meantime, this is a re-implementation for spark's use.
+ */
+object SparkRackResolver extends Logging {
+  private var dnsToSwitchMapping: DNSToSwitchMapping = _
+  private var initCalled = false
+
+  private def init(conf: Configuration): Unit = {
+    if (!initCalled) {
+      initCalled = true
+      val dnsToSwitchMappingClass =
+        
conf.getClass(CommonConfigurationKeysPublic.NET_TOPOLOGY_NODE_SWITCH_MAPPING_IMPL_KEY,
+          classOf[ScriptBasedMapping], classOf[DNSToSwitchMapping])
+      if 
(classOf[ScriptBasedMapping].isAssignableFrom(dnsToSwitchMappingClass)) {
+        val numArgs = 
conf.getInt(CommonConfigurationKeysPublic.NET_TOPOLOGY_SCRIPT_NUMBER_ARGS_KEY,
+          
CommonConfigurationKeysPublic.NET_TOPOLOGY_SCRIPT_NUMBER_ARGS_DEFAULT)
+        logInfo(s"Setting spark.hadoop.net.topology.script.number.args with a 
higher value " +
 
 Review comment:
   A simple test result in a 1000 nodes cluster looks like:
   > bin/spark-sql --master yarn --conf spark.executor.instances=50 --conf 
spark.executor.cores=3 --conf spark.dynamicAllocation.enabled=false 
--driver-memory 10g
   > spark-sql> select count(*) from test_table;
   
   19/03/10 18:21:42 INFO YarnScheduler: Adding task set 1.0 with 19411 tasks
   19/03/10 18:22:11 INFO TaskSetManager: Adding pending tasks take **29692** ms
   
   > bin/spark-sql --master yarn --conf spark.executor.instances=50 --conf 
spark.executor.cores=3 --conf spark.dynamicAllocation.enabled=false 
--driver-memory 10g **--conf 
spark.hadoop.net.topology.script.number.args=10000**
   > spark-sql> select count(*) from test_table;
   
   19/03/10 18:18:56 INFO YarnScheduler: Adding task set 1.0 with 19411 tasks
   19/03/10 18:19:11 INFO TaskSetManager: Adding pending tasks take **14935** ms
   
   That's why I hope to inform user to increase the max count of script 
arguments. @vanzin 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to