HeartSaVioR opened a new pull request #25048: [SPARK-28247][SS] Fix flaky test 
"query without test harness" on ContinuousSuite 
URL: https://github.com/apache/spark/pull/25048
 
 
   ## What changes were proposed in this pull request?
   
   This patch fixes the flaky test "query without test harness" on 
ContinuousSuite, via adding some more gaps on waiting query to commit the epoch 
which writes output rows.
   
   The observation of this issue is below (injected some debug logs to get 
them):
   
   ```
   reader creation time                                   1562225320210
   epoch 1 launched                                       1562225320593 (+380ms)
   epoch 13 launched                                      1562225321702 (+1.1s)
   partition reader creation time                         1562225321715 (+1.5s 
from reader creation time)
   
   first next called in partition reader                  1562225321746 
(immediately)
   next read time for first next call                     1562225321210 (+1s 
from reader creation time)
   wait finished in next called in partition reader       1562225321746 (no 
wait)
   
   second next called in partition reader                 1562225321747 
(immediately)
   next read time for second next call                    1562225322210 (+1s 
from previous "next read time")
   wait finished in next called in partition reader       1562225322211 (+450ms 
wait)
   
   writing rows (0, 1) (belong to epoch 13)               1562225321866
   writing rows (2, 3) (belong to epoch 13)               1562225322211
   epoch 14 launched                                      1562225322246
   
   epoch 12 committed                                     1562225323034
   
   wait start in waitForRateSourceTriggers(2)             1562225322059
   desired wait time in waitForRateSourceTriggers(2)      1562225322510 (+2.3s 
from reader creation time)
   ```
   
   These rows were written within desired wait time, but the epoch 13 couldn't 
be committed within it. Interestingly, epoch 12 was lucky to be committed 
within a gap between finished waiting in waitForRateSourceTriggers and 
query.stop() - but even suppose the rows were written in epoch 12, it would be 
just in luck and epoch should be committed within desired wait time.
   
   Given it took 1.5 seconds for partition reader to be initialized, we could 
wait double of gap (3 seconds) to stabilize.
   
   ## How was this patch tested?
   
   10 sequential test runs succeeded locally.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to