Ngone51 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix 
deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361419291
 
 

 ##########
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##########
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
     }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+    val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+    test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+      val conf = new SparkConf(false)
+        .set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+      val bus = new LiveListenerBus(conf)
+      val counter1 = new BasicJobCounter()
+      val counter2 = new BasicJobCounter()
+      val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+      bus.addToSharedQueue(counter1)
+      bus.addToSharedQueue(interruptingListener)
+      bus.addToEventLogQueue(counter2)
+      assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+      assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+      assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+      bus.start(mockSparkContext, mockMetricsSystem)
+
+      (0 until 5).foreach { jobId =>
+        bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+      }
+
+      // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+      val stoppingThread = new Thread(() => {
+        bus.stop()
+      })
+      stoppingThread.start()
+      // Notify interrupting listener starts to work
+      interruptingListener.sleep = false
 
 Review comment:
   > Unfortunately, checking the stoped status can't guarantee this. It's 
likely that the bus has already set the stoped status to true, but has not 
acquired the synchronized lock yet.
   
   IIUC, you want to let `interruptingListener` start to work once `bus` has 
moved to `stop` status and acquired the synchronized lock, right?
   
   But how can `bus` acquired the synchronized lock now? This fix has already 
removed the synchronized lock. The only thing you could do is to check `bus` 
status now and I think it's enough.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to