[GitHub] [spark] huaxingao opened a new pull request #29380: [SPARK-32506] Flaky test: StreamingLinearRegressionWithTests

GitBox Thu, 06 Aug 2020 10:59:05 -0700


huaxingao opened a new pull request #29380:
URL: https://github.com/apache/spark/pull/29380



   ### What changes were proposed in this pull request?
   The test creates 10 batches of data  to train the model and expects to see 
error on test data improves as model is trained. If the difference between the 
2nd error and the 10th error is smaller than 2, the assertion fails:
   ```
   FAIL: test_train_prediction 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
   Test that error on test data improves as model is trained.
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 466, in test_train_prediction
       eventually(condition, timeout=180.0)
     File "/home/runner/work/spark/spark/python/pyspark/testing/utils.py", line 
81, in eventually
       lastValue = condition()
     File 
"/home/runner/work/spark/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 461, in condition
       self.assertGreater(errors[1] - errors[-1], 2)
   AssertionError: 1.672640157855923 not greater than 2
   ```
   I saw this quite a few time on Jenkins but was not able to reproduce this on 
my local. These are the ten errors I got:
   ```
   4.517395047937127
   4.894265404350079
   3.0392090466559876
   1.8786361640757654
   0.8973106042078115
   0.3715780507684368
   0.20815690742907672
   0.17333033743125845
   0.15686783249863873
   0.12584413600569616
   ```
   I am thinking of having 15 batches of data instead of 10, so the model can 
be trained for a longer time. Hopefully the 15th error - 2nd error will always 
be larger than 2 on Jenkins. These are the 15 errors I got on my local:
   ```
   4.517395047937127
   4.894265404350079
   3.0392090466559876
   1.8786361640757658
   0.8973106042078115
   0.3715780507684368
   0.20815690742907672
   0.17333033743125845
   0.15686783249863873
   0.12584413600569616
   0.11883853835108477
   0.09400261862100823
   0.08887491447353497
   0.05984929624986607
   0.07583948141520978
   ```
   
   
   ### Why are the changes needed?
   Fix flaky test
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Manually tested
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao opened a new pull request #29380: [SPARK-32506] Flaky test: StreamingLinearRegressionWithTests

Reply via email to