Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/15505#discussion_r97395812
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
---
@@ -602,6 +619,21 @@ class CoarseGrainedSchedulerBackend(scheduler:
TaskSchedulerImpl, val rpcEnv: Rp
Future.successful(false)
}
-private[spark] object CoarseGrainedSchedulerBackend {
+private[spark] object CoarseGrainedSchedulerBackend extends Logging {
val ENDPOINT_NAME = "CoarseGrainedScheduler"
+
+ // abort TaskSetManager without exception
+ private[scheduler] def abortTaskSetManager(
+ scheduler: TaskSchedulerImpl,
+ taskId: Long,
+ msg: => String,
+ exception: Option[Throwable] = None): Unit = {
+ scheduler.taskIdToTaskSetManager.get(taskId).foreach { taskSetMgr =>
+ try {
+ taskSetMgr.abort(msg, exception)
--- End diff --
bringing back old thread:
>> we need to be careful about thread safety here -- this isn't safe to
call w/out a lock on TaskSchedulerImpl. Need to look a little closer as there
are warnings elsewhere about deadlock between backend and taskscheduler.
> taskSetMgr.abort is thread safety, It looks fine from the calling code.
I disagree, even calling `scheduler.taskIdToTaskSetManager.get(taskId)` is
unsafe without a lock on the scheduler -- what if the hashmap is getting
rehashed by another thread?
(I also have this vague fear that inside `taskSetMgr.abort`, you might end
up skipping the main part of `maybeFinishTaskSet()` if in another thread as
task is finished successfully from potential races w/ `isZombie` and
`numRunningTasks` ... but I can't come up with a sequence that would be bad.
In any case, one problem is enough.)
Just getting a lock on the `taskSchedulerImpl` here shouldn't throw away
all the performance gain, since then we only get the lock in this failure case,
and in the happy path we decrease contention for that lock.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]