JoshRosen commented on pull request #34265: URL: https://github.com/apache/spark/pull/34265#issuecomment-942675025
@jiangxb1987, I don't think this will adversely impact job cancellation via `cancelJobGroup`: The three `DAGScheduler` methods that I modified are invoked from the main Spark application thread, not the scheduler event thread. At this point no jobs have been submitted. Just for sake of argument / completeness, let's say that we had a slow `getPartitions` which was executing inside of the scheduler: before, the slow `getPartitions` wouldn't have been cancellable or interruptible at all, since we'd have to wait for it to complete before the job group cancellation event could be processed by the DAG scheduler's event loop. After this patch's change, that slow work is performed outside of the DAGScheduler. Even though there's no job to cancel, what happens if we're running in a notebook or REPL environment and want to cancel the running cell / command? This depends on the behavior of the notebook/REPL: if the notebook/REPL sends a `Thread.interrupt()` to the running driver thread then then `getPartitions` call might be able exit early depending on whether it's running code which checks for thread interrupts (such as IO, e.g. due to Hadoop filesystem listing operations). Given this, I think we're okay: we haven't impacted job group cancellation and the driver thread interruption situation is the same as it would be for other slow driver-side operations (such as query planning in Spark SQL). --- As an aside, I do think it's a bit confusing how the DAGScheduler class has both public methods (called outside the event loop) and private methods used inside the loop mixed into the same class but with different rules around accessing private state. If we wanted to re-architect things then I think it would be clearer to separate those responsibilities into separate classes (maybe something like `DAGSchedulerClient` and `DAGSchedulerBackend`) to more strongly enforce this separation and to prevent accidental access of event-loop state from outside of the loop. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
