GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/21715

    [SPARK-24740][PYTHON][ML] Make PySpark's tests compatible with NumPy 1.14.x+

    ## What changes were proposed in this pull request?
    
    This PR proposes to fix PySpark's tests compatible with NumPy 0.14.x+
    NumPy 0.14.x introduced rather radical changes about its string 
representation.
    
    For example, the tests below are filed:
    
    ```
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 895, in 
__main__.DenseMatrix.__str__
    Failed example:
        print(dm)
    Expected:
        DenseMatrix([[ 0.,  2.],
                     [ 1.,  3.]])
    Got:
        DenseMatrix([[0., 2.],
                     [1., 3.]])
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 899, in 
__main__.DenseMatrix.__str__
    Failed example:
        print(dm)
    Expected:
        DenseMatrix([[ 0.,  1.],
                     [ 2.,  3.]])
    Got:
        DenseMatrix([[0., 1.],
                     [2., 3.]])
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 939, in 
__main__.DenseMatrix.toArray
    Failed example:
        m.toArray()
    Expected:
        array([[ 0.,  2.],
               [ 1.,  3.]])
    Got:
        array([[0., 2.],
               [1., 3.]])
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 324, in 
__main__.DenseVector.dot
    Failed example:
        dense.dot(np.reshape([1., 2., 3., 4.], (2, 2), order='F'))
    Expected:
        array([  5.,  11.])
    Got:
        array([ 5., 11.])
    **********************************************************************
    File "/.../spark/python/pyspark/ml/linalg/__init__.py", line 567, in 
__main__.SparseVector.dot
    Failed example:
        a.dot(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]))
    Expected:
        array([ 22.,  22.])
    Got:
        array([22., 22.])
    ```
    
    See [release 
note](https://docs.scipy.org/doc/numpy-1.14.0/release.html#compatibility-notes).
    
    ## How was this patch tested?
    
    Manually tested:
    
    ```
    $ ./run-tests --python-executables=python3.6,python2.7 
--modules=pyspark-ml,pyspark-mllib
    Running PySpark tests. Output is in 
/Users/hkwon/workspace/repos/forked/spark/python/unit-tests.log
    Will test against the following Python executables: ['python3.6', 
'python2.7']
    Will test the following Python modules: ['pyspark-ml', 'pyspark-mllib']
    Starting test(python2.7): pyspark.mllib.tests
    Starting test(python2.7): pyspark.ml.classification
    Starting test(python3.6): pyspark.mllib.tests
    Starting test(python2.7): pyspark.ml.clustering
    Finished test(python2.7): pyspark.ml.clustering (54s)
    Starting test(python2.7): pyspark.ml.evaluation
    Finished test(python2.7): pyspark.ml.classification (74s)
    Starting test(python2.7): pyspark.ml.feature
    Finished test(python2.7): pyspark.ml.evaluation (27s)
    Starting test(python2.7): pyspark.ml.fpm
    Finished test(python2.7): pyspark.ml.fpm (0s)
    Starting test(python2.7): pyspark.ml.image
    Finished test(python2.7): pyspark.ml.image (17s)
    Starting test(python2.7): pyspark.ml.linalg.__init__
    Finished test(python2.7): pyspark.ml.linalg.__init__ (1s)
    Starting test(python2.7): pyspark.ml.recommendation
    Finished test(python2.7): pyspark.ml.feature (76s)
    Starting test(python2.7): pyspark.ml.regression
    Finished test(python2.7): pyspark.ml.recommendation (69s)
    Starting test(python2.7): pyspark.ml.stat
    Finished test(python2.7): pyspark.ml.regression (45s)
    Starting test(python2.7): pyspark.ml.tests
    Finished test(python2.7): pyspark.ml.stat (28s)
    Starting test(python2.7): pyspark.ml.tuning
    Finished test(python2.7): pyspark.ml.tuning (20s)
    Starting test(python2.7): pyspark.mllib.classification
    Finished test(python2.7): pyspark.mllib.classification (31s)
    Starting test(python2.7): pyspark.mllib.clustering
    Finished test(python2.7): pyspark.mllib.tests (260s)
    Starting test(python2.7): pyspark.mllib.evaluation
    Finished test(python3.6): pyspark.mllib.tests (266s)
    Starting test(python2.7): pyspark.mllib.feature
    Finished test(python2.7): pyspark.mllib.evaluation (21s)
    Starting test(python2.7): pyspark.mllib.fpm
    Finished test(python2.7): pyspark.mllib.feature (38s)
    Starting test(python2.7): pyspark.mllib.linalg.__init__
    Finished test(python2.7): pyspark.mllib.linalg.__init__ (1s)
    Starting test(python2.7): pyspark.mllib.linalg.distributed
    Finished test(python2.7): pyspark.mllib.fpm (34s)
    Starting test(python2.7): pyspark.mllib.random
    Finished test(python2.7): pyspark.mllib.clustering (64s)
    Starting test(python2.7): pyspark.mllib.recommendation
    Finished test(python2.7): pyspark.mllib.random (15s)
    Starting test(python2.7): pyspark.mllib.regression
    Finished test(python2.7): pyspark.mllib.linalg.distributed (47s)
    Starting test(python2.7): pyspark.mllib.stat.KernelDensity
    Finished test(python2.7): pyspark.mllib.stat.KernelDensity (0s)
    Starting test(python2.7): pyspark.mllib.stat._statistics
    Finished test(python2.7): pyspark.mllib.recommendation (40s)
    Starting test(python2.7): pyspark.mllib.tree
    Finished test(python2.7): pyspark.mllib.regression (38s)
    Starting test(python2.7): pyspark.mllib.util
    Finished test(python2.7): pyspark.mllib.stat._statistics (19s)
    Starting test(python3.6): pyspark.ml.classification
    Finished test(python2.7): pyspark.mllib.tree (26s)
    Starting test(python3.6): pyspark.ml.clustering
    Finished test(python2.7): pyspark.mllib.util (27s)
    Starting test(python3.6): pyspark.ml.evaluation
    Finished test(python3.6): pyspark.ml.evaluation (30s)
    Starting test(python3.6): pyspark.ml.feature
    Finished test(python2.7): pyspark.ml.tests (234s)
    Starting test(python3.6): pyspark.ml.fpm
    Finished test(python3.6): pyspark.ml.fpm (1s)
    Starting test(python3.6): pyspark.ml.image
    Finished test(python3.6): pyspark.ml.clustering (55s)
    Starting test(python3.6): pyspark.ml.linalg.__init__
    Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
    Starting test(python3.6): pyspark.ml.recommendation
    Finished test(python3.6): pyspark.ml.classification (71s)
    Starting test(python3.6): pyspark.ml.regression
    Finished test(python3.6): pyspark.ml.image (18s)
    Starting test(python3.6): pyspark.ml.stat
    Finished test(python3.6): pyspark.ml.stat (37s)
    Starting test(python3.6): pyspark.ml.tests
    Finished test(python3.6): pyspark.ml.regression (59s)
    Starting test(python3.6): pyspark.ml.tuning
    Finished test(python3.6): pyspark.ml.feature (93s)
    Starting test(python3.6): pyspark.mllib.classification
    Finished test(python3.6): pyspark.ml.recommendation (83s)
    Starting test(python3.6): pyspark.mllib.clustering
    Finished test(python3.6): pyspark.ml.tuning (29s)
    Starting test(python3.6): pyspark.mllib.evaluation
    Finished test(python3.6): pyspark.mllib.evaluation (26s)
    Starting test(python3.6): pyspark.mllib.feature
    Finished test(python3.6): pyspark.mllib.classification (43s)
    Starting test(python3.6): pyspark.mllib.fpm
    Finished test(python3.6): pyspark.mllib.clustering (81s)
    Starting test(python3.6): pyspark.mllib.linalg.__init__
    Finished test(python3.6): pyspark.mllib.linalg.__init__ (2s)
    Starting test(python3.6): pyspark.mllib.linalg.distributed
    Finished test(python3.6): pyspark.mllib.fpm (48s)
    Starting test(python3.6): pyspark.mllib.random
    Finished test(python3.6): pyspark.mllib.feature (54s)
    Starting test(python3.6): pyspark.mllib.recommendation
    Finished test(python3.6): pyspark.mllib.random (18s)
    Starting test(python3.6): pyspark.mllib.regression
    Finished test(python3.6): pyspark.mllib.linalg.distributed (55s)
    Starting test(python3.6): pyspark.mllib.stat.KernelDensity
    Finished test(python3.6): pyspark.mllib.stat.KernelDensity (1s)
    Starting test(python3.6): pyspark.mllib.stat._statistics
    Finished test(python3.6): pyspark.mllib.recommendation (51s)
    Starting test(python3.6): pyspark.mllib.tree
    Finished test(python3.6): pyspark.mllib.regression (45s)
    Starting test(python3.6): pyspark.mllib.util
    Finished test(python3.6): pyspark.mllib.stat._statistics (21s)
    Finished test(python3.6): pyspark.mllib.tree (27s)
    Finished test(python3.6): pyspark.mllib.util (27s)
    Finished test(python3.6): pyspark.ml.tests (264s)
    Tests passed in 752 seconds
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-24740

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21715
    
----
commit 99b4e7ea5edc98dd571cdb5711bea3ea2d3f77c5
Author: hyukjinkwon <gurwls223@...>
Date:   2018-07-04T16:19:49Z

    Make PySpark's tests compatible with NumPy 1.14.x+

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to