[jira] [Resolved] (SPARK-29205) Pyspark tests failed for suspected performance problem on ARM

Hyukjin Kwon (Jira) Tue, 24 Sep 2019 06:09:34 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-29205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-29205.
----------------------------------
    Resolution: Duplicate

Unfortunately, seems it's still flaky (although less flaky after the fix).

> Pyspark tests failed for suspected performance problem on ARM
> -------------------------------------------------------------
>
>                 Key: SPARK-29205
>                 URL: https://issues.apache.org/jira/browse/SPARK-29205
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.0.0
>         Environment: OS: Ubuntu16.04
> Arch: aarch64
> Host: Virtual Machine
>            Reporter: zhao bo
>            Priority: Major
>
> We test the pyspark on ARM VM. But found some test fails, once we change the 
> source code to extend the wait time for making sure those test tasks had 
> finished, then the test will pass.
>  
> The affected test cases including:
> pyspark.mllib.tests.test_streaming_algorithms:StreamingLinearRegressionWithTests.test_parameter_convergence
> pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_convergence
> pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy
> pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
> The error message about above test fails:
> ======================================================================
> FAIL: test_parameter_convergence 
> (pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
> Test that the model parameters improve with streaming data.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 429, in test_parameter_convergen ce
>     self._eventually(condition, catch_assertions=True)
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 74, in _eventually
>     raise lastValue
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 65, in _eventually
>     lastValue = condition()
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 425, in condition
>     self.assertEqual(len(model_weights), len(batches))
> AssertionError: 6 != 10
>  
>  
> ======================================================================
> FAIL: test_convergence 
> (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 292, in test_convergence
>     self._eventually(condition, 60.0, catch_assertions=True)
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 74, in _eventually
>     raise lastValue
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 65, in _eventually
>     lastValue = condition()
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 288, in condition
>     self.assertEqual(len(models), len(input_batches))
> AssertionError: 19 != 20
>  
> ======================================================================
> FAIL: test_parameter_accuracy 
> (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 266, in test_parameter_accuracy
>     self._eventually(condition, catch_assertions=True)
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 74, in _eventually
>     raise lastValue
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 65, in _eventually
>     lastValue = condition()
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 263, in condition
>     self.assertAlmostEqual(rel, 0.1, 1)
> AssertionError: 0.21309223935797794 != 0.1 within 1 places
>  
> ======================================================================
> FAIL: test_training_and_prediction 
> (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
> Test that the model improves on toy data with no. of batches
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 367, in test_training_and_predic tion
>     self._eventually(condition, timeout=60.0)
>   File 
> "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py",
>  line 78, in _eventually
>     % (timeout, lastValue))
> AssertionError: Test failed due to timeout after 60 sec, with last condition 
> returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0. 75, 0.74, 0.73, 0.69, 
> 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74, 0.76, 0.78, 0.7, 0.78, 0.8
>  
>  
> Is it possible to expand the job time to make sure the job run finish if the 
> test result is not change?? Any help or advice is welcome. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29205) Pyspark tests failed for suspected performance problem on ARM

Reply via email to