squito commented on a change in pull request #23951: [SPARK-13704][CORE][YARN]
Re-implement RackResolver to reduce resolving time
URL: https://github.com/apache/spark/pull/23951#discussion_r266926865
##########
File path:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -812,33 +814,36 @@ private[spark] class TaskSchedulerImpl(
}
// Add a on-off switch to save time for rack resolving
- lazy val skipRackResolving: Boolean = sc.conf.get(LOCALITY_WAIT_RACK) == 0L
+ private def skipRackResolving: Boolean = sc.conf.get(LOCALITY_WAIT_RACK) ==
0L
/**
- * Rack is unknown by default.
+ * Rack is `unknown` by default.
* It can be override in different TaskScheduler, like Yarn.
*/
- def defaultRackValue: Option[String] = None
+ protected val defaultRackValue: String = "unknown"
- def doGetRacksForHosts(preferredLocation: List[String]):
List[Option[String]] = Nil
+ /**
+ * Get racks info for a host list. This is the internal method of
[[getRacksForHosts]].
+ * It should be override in different TaskScheduler. Return [[Nil]] by
default.
+ */
+ protected def doGetRacksForHosts(hosts: Seq[String]): Seq[String] = Nil
Review comment:
doesn't the returned list have to be the same length as the passed in hosts?
you take the result of this and `zip` it with other collections, so I'd assume
you do. You should both fix the default implementation *and* update the doc to
mention this requirement.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]