HeartSaVioR opened a new pull request #25706: [SPARK-26989][CORE][TEST] DAGSchedulerSuite: ensure listeners are fully processed before checking failedStages URL: https://github.com/apache/spark/pull/25706 ### What changes were proposed in this pull request? This patch ensures accessing `failedStages` is always after letting listeners fully process all events. Without this guard, two threads are running concurrently - 1) listeners process thread 2) test main thread - and race condition would occur. That's why we also see very odd thing, error message saying condition is met but test failed: ``` - Barrier task failures from the same stage attempt don't trigger multiple stage retries *** FAILED *** ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656) ``` which means verification failed, and condition is met just before constructing error message. The guard is properly placed in many spots, but missed in some places. ### Why are the changes needed? UT fails intermittently and this patch will address the flakyness. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Modified UT. It's not easy to reproduce (not often) so I'd feel safer to get reviewed by more eyes instead of blindly running test more times.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
