HeartSaVioR opened a new pull request #25706: [SPARK-26989][CORE][TEST] 
DAGSchedulerSuite: ensure listeners are fully processed before checking 
failedStages
URL: https://github.com/apache/spark/pull/25706
 
 
   ### What changes were proposed in this pull request?
   
   This patch ensures accessing `failedStages` is always after letting 
listeners fully process all events. Without this guard, two threads are running 
concurrently - 1) listeners process thread 2) test main thread - and race 
condition would occur.
   
   That's why we also see very odd thing, error message saying condition is met 
but test failed:
   ```
   - Barrier task failures from the same stage attempt don't trigger multiple 
stage retries *** FAILED ***
     ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656)
   ```
   which means verification failed, and condition is met just before 
constructing error message.
   
   The guard is properly placed in many spots, but missed in some places.
   
   ### Why are the changes needed?
   
   UT fails intermittently and this patch will address the flakyness.
   
   ### Does this PR introduce any user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Modified UT. It's not easy to reproduce (not often) so I'd feel safer to get 
reviewed by more eyes instead of blindly running test more times.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to