squito commented on a change in pull request #23951: [SPARK-13704][CORE][YARN]
Re-implement RackResolver to reduce resolving time
URL: https://github.com/apache/spark/pull/23951#discussion_r268472376
##########
File path:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -811,8 +818,38 @@ private[spark] class TaskSchedulerImpl(
blacklistTrackerOpt.map(_.nodeBlacklist()).getOrElse(scala.collection.immutable.Set())
}
+ // Add a on-off switch to save time for rack resolving
+ private def skipRackResolving: Boolean = sc.conf.get(LOCALITY_WAIT_RACK) ==
0L
Review comment:
I was just reviewing another patch related to delay scheduling, and I
realized this optimization is a bit too aggressive. That configuration only
controls whether you *wait* for a resource that is rack-local. Even when the
wait is 0, spark still tries to to find a rack-local task for a given resource
offer; it just will schedule a non-rack-local task even if it can't find a
rack-local one. But it won't be able to do that if it doesn't know what racks
the resource offers are on.
So I think you either need to:
a) change this to use a new conf, with an extra check that you only turn off
rack resolution entirely if its *also* true that
`sc.conf.get(LOCALITY_WAIT_RACK) == 0L`
b) is this optimization even needed, considering how much time the rest of
this change should save? Maybe we should still always do the rack resolution,
since it should be pretty fast after the rest of your change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]