squito commented on issue #22806: [SPARK-25250][CORE] : Late zombie task 
completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#issuecomment-459767222
 
 
   uggh ... I just realized a problem.  Everything in TaskSetManager is 
supposed to be protected by a lock on `TaskSchedulerImpl`, but now 
`markPartitionCompleted()` is getting called without that lock.  The old 
version guaranteed that, because it was called from within a method in the TSM 
where we already had that lock.  But now, we've moved that out to the 
DAGSchedulerEventLoop, where you do not necessarily have that lock.
   
   I haven't thought through the consequences of this -- without proper updates 
to `tasksSuccessful` and `successful`, what could go wrong?  maybe we'd never 
mark the taskset as finished?  will this lead to failures, or just 
inefficiencies?
   
   I'm not sure how to fix this.  You could make 
`TaskSchedulerImpl.markPartitionCompletedInAllTaskSets()` synchronized, but 
then you're getting a lock on the taskSchedulerImpl in the DAGScheduler event 
loop. That's not good for scheduling throughput, and also want to make sure 
there is no change of deadlock.
   
   I feel like there should be a better solution ...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to