zsxwing opened a new pull request #32316:
URL: https://github.com/apache/spark/pull/32316


   ### What changes were proposed in this pull request?
   
   This is another attempt to fix the flaky test "query without test harness" 
on ContinuousSuite.
   
   `query without test harness` is flaky because it starts a continuous query 
with two partitions but assumes they will run at the same speed.
   
   In this test, 0 and 2 will be written to partition 0, 1 and 3 will be 
written to partition 1. It assumes when we see 3, 2 should be written to the 
memory sink. But this is not guaranteed. We can add `if (currentValue == 2) 
Thread.sleep(5000)` at this line 
https://github.com/apache/spark/blob/b2a2b5d8206b7c09b180b8b6363f73c6c3fdb1d8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala#L135
 to reproduce the failure: `Result set Set([0], [1], [3]) are not a superset of 
Set(0, 1, 2, 3)!`
   
   The fix is changing `waitForRateSourceCommittedValue` to wait until all 
partitions reach the desired values before stopping the query.
   
   ### Why are the changes needed?
   
   Fix a flaky test.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Existing tests. Manually verify the reproduction I mentioned above doesn't 
fail after this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to