Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/16270#discussion_r92276753
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala
---
@@ -157,8 +160,16 @@ abstract class SchedulerIntegrationSuite[T <:
MockBackend: ClassTag] extends Spa
}
// When a job fails, we terminate before waiting for all the task
end events to come in,
// so there might still be a running task set. So we only check
these conditions
- // when the job succeeds
- assert(taskScheduler.runningTaskSets.isEmpty)
+ // when the job succeeds.
+ // When the final task of a taskset completes, we post
+ // the event to the DAGScheduler event loop before we finish
processing in the taskscheduler
+ // thread. It's possible the DAGScheduler thread processes the
event, finishes the job,
+ // and notifies the job waiter before our original thread in the
task scheduler finishes
+ // handling the event and marks the taskset as complete. So its ok
if we need to wait a
+ // *little* bit longer for the original taskscheduler thread to
finish up to deal w/ the race.
+ eventually(timeout(1 second), interval(100 millis)) {
--- End diff --
I might do 10 millis here since it seems like there's not much cost here to
checking more frequently, and 100 millis is somewhat long to wait
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]