JoshRosen commented on pull request #34265:
URL: https://github.com/apache/spark/pull/34265#issuecomment-942675025


   @jiangxb1987, I don't think this will adversely impact job cancellation via 
`cancelJobGroup`:
   
   The three `DAGScheduler` methods that I modified are invoked from the main 
Spark application thread, not the scheduler event thread. At this point no jobs 
have been submitted.
   
   Just for sake of argument / completeness, let's say that we had a slow 
`getPartitions` which was executing inside of the scheduler: before, the slow 
`getPartitions` wouldn't have been cancellable or interruptible at all, since 
we'd have to wait for it to complete before the job group cancellation event 
could be processed by the DAG scheduler's event loop.
   
   After this patch's change, that slow work is performed outside of the 
DAGScheduler. Even though there's no job to cancel, what happens if we're 
running in a notebook or REPL environment and want to cancel the running cell / 
command? This depends on the behavior of the notebook/REPL: if the 
notebook/REPL sends a `Thread.interrupt()` to the running driver thread then 
then `getPartitions` call might be able exit early depending on whether it's 
running code which checks for thread interrupts (such as IO, e.g. due to Hadoop 
filesystem listing operations).
   
   Given this, I think we're okay: we haven't impacted job group cancellation 
and the driver thread interruption situation is the same as it would be for 
other slow driver-side operations (such as query planning in Spark SQL).
   
   ---
   
   As an aside, I do think it's a bit confusing how the DAGScheduler class has 
both public methods (called outside the event loop) and private methods used 
inside the loop mixed into the same class but with different rules around 
accessing private state. If we wanted to re-architect things then I think it 
would be clearer to separate those responsibilities into separate classes 
(maybe something like `DAGSchedulerClient` and `DAGSchedulerBackend`) to more 
strongly enforce this separation and to prevent accidental access of event-loop 
state from outside of the loop.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to