GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/20533

    [SPARK-23300][TESTS][BRANCH-2.3] Prints out if Pandas and PyArrow are 
installed or not in PySpark SQL tests

    This PR backports https://github.com/apache/spark/pull/20473 to branch-2.3.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark backport-20473

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20533.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20533
    
----
commit 6a703e9e0e34ae47ff2118e51f526895f0ffff6e
Author: hyukjinkwon <gurwls223@...>
Date:   2018-02-06T07:08:15Z

    [SPARK-23300][TESTS] Prints out if Pandas and PyArrow are installed or not 
in PySpark SQL tests
    
    This PR proposes to log if PyArrow and Pandas are installed or not so we 
can check if related tests are going to be skipped or not.
    
    Manually tested:
    
    I don't have PyArrow installed in PyPy.
    ```bash
    $ ./run-tests --python-executables=python3
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python3']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will test PyArrow related features against Python executable 'python3' in 
'pyspark-sql' module.
    Will test Pandas related features against Python executable 'python3' in 
'pyspark-sql' module.
    Starting test(python3): pyspark.mllib.tests
    Starting test(python3): pyspark.sql.tests
    Starting test(python3): pyspark.streaming.tests
    Starting test(python3): pyspark.tests
    ```
    
    ```bash
    $ ./run-tests --modules=pyspark-streaming
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-streaming']
    Starting test(pypy): pyspark.streaming.tests
    Starting test(pypy): pyspark.streaming.util
    Starting test(python2.7): pyspark.streaming.tests
    Starting test(python2.7): pyspark.streaming.util
    ```
    
    ```bash
    $ ./run-tests
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will test PyArrow related features against Python executable 'python2.7' in 
'pyspark-sql' module.
    Will test Pandas related features against Python executable 'python2.7' in 
'pyspark-sql' module.
    Will skip PyArrow related features against Python executable 'pypy' in 
'pyspark-sql' module. PyArrow >= 0.8.0 is required; however, PyArrow was not 
found.
    Will test Pandas related features against Python executable 'pypy' in 
'pyspark-sql' module.
    Starting test(pypy): pyspark.streaming.tests
    Starting test(pypy): pyspark.sql.tests
    Starting test(pypy): pyspark.tests
    Starting test(python2.7): pyspark.mllib.tests
    ```
    
    ```bash
    $ ./run-tests --modules=pyspark-sql --python-executables=pypy
    ```
    
    ```
    ...
    Will test against the following Python executables: ['pypy']
    Will test the following Python modules: ['pyspark-sql']
    Will skip PyArrow related features against Python executable 'pypy' in 
'pyspark-sql' module. PyArrow >= 0.8.0 is required; however, PyArrow was not 
found.
    Will test Pandas related features against Python executable 'pypy' in 
'pyspark-sql' module.
    Starting test(pypy): pyspark.sql.tests
    Starting test(pypy): pyspark.sql.catalog
    Starting test(pypy): pyspark.sql.column
    Starting test(pypy): pyspark.sql.conf
    ```
    
    After some modification to produce other cases:
    
    ```bash
    $ ./run-tests
    ```
    
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will skip PyArrow related features against Python executable 'python2.7' in 
'pyspark-sql' module. PyArrow >= 20.0.0 is required; however, PyArrow 0.8.0 was 
found.
    Will skip Pandas related features against Python executable 'python2.7' in 
'pyspark-sql' module. Pandas >= 20.0.0 is required; however, Pandas 0.20.2 was 
found.
    Will skip PyArrow related features against Python executable 'pypy' in 
'pyspark-sql' module. PyArrow >= 20.0.0 is required; however, PyArrow was not 
found.
    Will skip Pandas related features against Python executable 'pypy' in 
'pyspark-sql' module. Pandas >= 20.0.0 is required; however, Pandas 0.22.0 was 
found.
    Starting test(pypy): pyspark.sql.tests
    Starting test(pypy): pyspark.streaming.tests
    Starting test(pypy): pyspark.tests
    Starting test(python2.7): pyspark.mllib.tests
    ```
    
    ```bash
    ./run-tests-with-coverage
    ```
    ```
    ...
    Will test against the following Python executables: ['python2.7', 'pypy']
    Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
    Will test PyArrow related features against Python executable 'python2.7' in 
'pyspark-sql' module.
    Will test Pandas related features against Python executable 'python2.7' in 
'pyspark-sql' module.
    Coverage is not installed in Python executable 'pypy' but 
'COVERAGE_PROCESS_START' environment variable is set, exiting.
    ```
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #20473 from HyukjinKwon/SPARK-23300.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to