Ngone51 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix
deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361433372
##########
File path:
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
##########
@@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with
LocalSparkContext with Match
}
}
+ Seq(true, false).foreach { throwInterruptedException =>
+ val suffix = if (throwInterruptedException) "throw interrupt" else "set
Thread interrupted"
+ test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError:
$suffix") {
+ val conf = new SparkConf(false)
+ .set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+ val bus = new LiveListenerBus(conf)
+ val counter1 = new BasicJobCounter()
+ val counter2 = new BasicJobCounter()
+ val interruptingListener = new
DelayInterruptingJobCounter(throwInterruptedException, 3)
+ bus.addToSharedQueue(counter1)
+ bus.addToSharedQueue(interruptingListener)
+ bus.addToEventLogQueue(counter2)
+ assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+ assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+ assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size ===
1)
+
+ bus.start(mockSparkContext, mockMetricsSystem)
+
+ (0 until 5).foreach { jobId =>
+ bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+ }
+
+ // Call bus.stop in a separate thread, otherwise we will block here
until bus is stopped
+ val stoppingThread = new Thread(() => {
+ bus.stop()
+ })
+ stoppingThread.start()
+ // Notify interrupting listener starts to work
+ interruptingListener.sleep = false
Review comment:
Only focus on `LiveListenerBus` may be impossible to workaround the
difficulties you mentioned above. Maybe we should move to `AsyncEventQueue`.
How about this way:
1. Add a method `status()` in `AsyncEventQueue` for testing only;
2. In `interruptingListener`, keep checking `AsyncEventQueue.status()` until
it's stopped. So, when `AsyncEventQueue` is stopped, we're sure that
`LiveListenerBus` has stopped too and acquired the lock(without fix).
WDYT?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]