Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13500107
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -54,8 +54,15 @@ private[spark] class TaskSetManager(
clock: Clock = SystemClock)
extends Schedulable with Logging
{
+ // Remember when this TaskSetManager is created
+ val creationTime = clock.getTime()
val conf = sched.sc.conf
+ // The period we wait for new executors to come up
+ // After this period, tasks in pendingTasksWithNoPrefs will be
considered as PROCESS_LOCAL
+ private val WAIT_NEW_EXEC_TIMEOUT =
conf.getLong("spark.scheduler.waitNewExecutorTime", 3000L)
--- End diff --
It doesn't make sense to put this here because it will apply to every
TaskSet, no matter how late into the application it was submitted, so you'll
get a 3-second latency on every TaskSet that is missing one of its preferred
nodes. Can we not add this as part of this patch, and simply make the change to
put tasks in the node- and rack-local lists even if no nodes are available in
those right now? Then later we can update the code that calls resourceOffer to
treat tasks that have preferred locations but are missing executors for them
specially.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---