GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/20625
[SPARK-23446][PYTHON] Explicitly check supported types in toPandas
## What changes were proposed in this pull request?
This PR explicitly specifies the types we supported in `toPandas`. This was
a hole. For example, we haven't finished the binary type support in Python side
yet but now it allows as below:
```python
spark.conf.set("spark.sql.execution.arrow.enabled", "false")
df = spark.createDataFrame([[bytearray("a")]])
df.toPandas()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
df.toPandas()
```
```
_1
0 [97]
_1
0 a
```
This should be disallowed. I think the same things also apply to nested
timestamps too.
I also added some nicer message about `spark.sql.execution.arrow.enabled`
in the error message.
## How was this patch tested?
Manually tested and tests added in `python/pyspark/sql/tests.py`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark
pandas_convertion_supported_type
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20625.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20625
----
commit c79c6df7284b9717fe4e4c26090dcb51bf7712da
Author: hyukjinkwon <gurwls223@...>
Date: 2018-02-16T07:45:52Z
Explicitly specify supported types in toPandas
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]