GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/20487
[SPARK-23319][TESTS] Explicitly skips PySpark tests for old Pandas and
PyArrow
## What changes were proposed in this pull request?
This PR proposes to explicitly skip the tests for old Pandas and PyArrow.
We declared the extra dependencies:
https://github.com/apache/spark/blob/b8bfce51abf28c66ba1fc67b0f25fe1617c81025/python/setup.py#L204
but currently we only check if pyarrow is installed or not without checking
the version. It already fails to run tests.
Also, we have a conditional skip for old Pandas. Seems we specify the
condition for Pandas >= 0.19.2.
## How was this patch tested?
Manually tested by modifying the condition:
```
test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests)
... skipped 'Pandas >= 1.19.2 must be installed; however, your version was
0.19.2.'
test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests)
... skipped 'Pandas >= 1.19.2 must be installed; however, your version was
0.19.2.'
test_createDataFrame_respect_session_timezone
(pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 1.19.2 must be installed;
however, your version was 0.19.2.'
```
```
test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests)
... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests)
... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
test_createDataFrame_respect_session_timezone
(pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed;
however, it was not found.'
```
```
test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests)
... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was
0.8.0.'
test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests)
... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was
0.8.0.'
test_createDataFrame_respect_session_timezone
(pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 1.8.0 must be installed;
however, your version was 0.8.0.'
```
```
test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests)
... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests)
... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
test_createDataFrame_respect_session_timezone
(pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed;
however, it was not found.'
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark pyarrow-pandas-skip
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20487.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20487
----
commit 08b42f80322636169fc440e0e2f36819b8d6e837
Author: hyukjinkwon <gurwls223@...>
Date: 2018-02-02T13:21:34Z
Explicitly skips PySpark tests for old Pandas and PyArrow
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]