Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/16354#discussion_r93292503
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -549,11 +546,15 @@ class TaskSetManagerSuite extends SparkFunSuite with
LocalSparkContext with Logg
Seq(TaskLocation("host1", "execB")),
Seq(TaskLocation("host2", "execC")),
Seq())
- val manager = new TaskSetManager(sched, taskSet, 1, clock = new
ManualClock)
+ val clock = new ManualClock()
+ val manager = new TaskSetManager(sched, taskSet, 1, clock = clock)
sched.addExecutor("execA", "host1")
manager.executorAdded()
sched.addExecutor("execC", "host2")
manager.executorAdded()
+ // need one resource offer which doesn't schedule anything to start
delay scheduling timer
+ assert(manager.resourceOffer("exec1", "host1", ANY).isEmpty)
+ clock.advance(sc.getConf.getTimeAsMs("spark.locality.wait", "3s") * 4)
assert(manager.resourceOffer("exec1", "host1", ANY).isDefined)
--- End diff --
this used to work before the wait by a rather convoluted path. When the
task set is created, there aren't any executors available that meet locality
constraints, so the allowed locality levels are `NO_PREF, ANY`. After adding
`("execA", "host1")` ,the locality levels are changed to
`PROCESS_LOCAL,NODE_LOCAL,NO_PREF,ANY`. However, we leave our current locality
level at `NO_PREF` in `recomputeLocality()`, even though a tighter level is now
available.
This offer would fail to schedule anything if either:
1) we have an intervening offer on "execA", "host1" (which would match at
`PROCESS_LOCAL`, thus drop the locality level down)
or
2) we move `sched.addExecutor("execA", "host1")` to before we create task
set manager, in which case the initial locality levels would include
`PROCESS_LOCAL`.
I really don't think the old behavior was desirable (even if we don't make
the other changes proposed here). I'm also inclined to do more cleanup to this
test -- best case, the naming of execs makes this unnecessarily confusing
("execA" is local while "exec1" is not).
I've added some logging to the original behavior make this a bit more clear
here, if its helpful to look at:
https://github.com/squito/spark/tree/tsm_unrelated_debug
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]