GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/23236

    [SPARK-26275][PYTHON][ML] Increases timeout for 
StreamingLogisticRegressionWithSGDTests.test_training_and_prediction test

    ## What changes were proposed in this pull request?
    
    Looks this test is flaky
    
    
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99704/console
    
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99569/console
    
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99644/console
    
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99548/console
    
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99454/console
    
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99609/console
    
    ```
    ======================================================================
    FAIL: test_training_and_prediction 
(pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
    Test that the model improves on toy data with no. of batches
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 367, in test_training_and_prediction
        self._eventually(condition)
      File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py",
 line 78, in _eventually
        % (timeout, lastValue))
    AssertionError: Test failed due to timeout after 30 sec, with last 
condition returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0.75, 0.74, 0.73, 
0.69, 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74
    
    ----------------------------------------------------------------------
    Ran 13 tests in 185.051s
    
    FAILED (failures=1, skipped=1)
    ```
    
    This looks happening after increasing the parallelism in Jenkins to speed 
up at https://github.com/apache/spark/pull/23111. I am able to reproduce this 
manually when the resource usage is heavy (with manual decrease of timeout).
    
    ## How was this patch tested?
    
    Manually tested by 
    
    ```
    cd python
    ./run-tests --testnames 'pyspark.mllib.tests.test_streaming_algorithms 
StreamingLogisticRegressionWithSGDTests.test_training_and_prediction' 
--python-executables=python
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-26275

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23236
    
----
commit 3c4ee75c4d0585702cd87cc4df9af74e235bb431
Author: Hyukjin Kwon <gurwls223@...>
Date:   2018-12-05T12:17:21Z

    Increases timeout for 
StreamingLogisticRegressionWithSGDTests.test_training_and_prediction test

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to