Ngone51 commented on a change in pull request #27004: [SPARK-30348][CORE] Fix
flaky test failure on "MasterSuite.SPARK-27510: Master should avoid ..."
URL: https://github.com/apache/spark/pull/27004#discussion_r361239993
##########
File path: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala
##########
@@ -682,7 +682,11 @@ class MasterSuite extends SparkFunSuite
// an app would be registered with Master once Driver set up
assert(worker.apps.nonEmpty)
appId = worker.apps.head._1
- assert(master.idToApp.contains(appId))
+
+ // we found the case where the test was too fast which all steps were
done within
+ // an interval - in this case, we have to check either app is
available in master
+ // or marked as completed. See SPARK-30348 for details.
+ assert(master.idToApp.contains(appId) ||
master.completedApps.exists(_.id == appId))
Review comment:
My concern is, in this test:
1st assert is to make sure that the application exists in `Master.idToApp`
2nd assert is to check the application has been remove from `Master.idToApp`
after MAX_EXECUTOR_RETRIES.
But if we find the application has completed in 1st assert, which means it
doesn't exist in `Master.idToApp`, I think it has broke original assume.
How about setting up a `CountDownLatch` in `MockExecutorLaunchFailWorker`,
and count down before we reach MAX_EXECUTOR_RETRIES ? So we can be sure the
application hasn't been removed from `Master.idToApp`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]