GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/20567
[SPARK-23380][PYTHON] Make toPandas fall back to Arrow optimization
disabled when schema is not supported in Arrow optimization
## What changes were proposed in this pull request?
This PR proposes to fall back to one without Arrow when schema is not
supported in Arrow optimisation.
```python
df = spark.createDataFrame([[{'a': 1}]])
spark.conf.set("spark.sql.execution.arrow.enabled", "false")
df.toPandas()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
df.toPandas()
```
**Before**
```
...
py4j.protocol.Py4JJavaError: An error occurred while calling
o42.collectAsArrowToPython.
...
java.lang.UnsupportedOperationException: Unsupported data type:
map<string,bigint>
```
**After**
```
...
_1
0 {u'a': 1}
... UserWarning: Arrow will not be used in toPandas: Unsupported type in
conversion to Arrow: MapType(StringType,LongType,true)
...
_1
0 {u'a': 1}
```
Note that, in case of `createDataFrame`, we already fall back to make this
at least working even though the optimisation is disabled:
```python
df = spark.createDataFrame([[{'a': 1}]])
spark.conf.set("spark.sql.execution.arrow.enabled", "false")
pdf = df.toPandas()
spark.createDataFrame(pdf).show()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
spark.createDataFrame(pdf).show()
```
```
...
... UserWarning: Arrow will not be used in createDataFrame: Error inferring
Arrow type ...
+--------+
| _1|
+--------+
|[a -> 1]|
+--------+
```
## How was this patch tested?
Manually tested and unit tests were added in `python/pyspark/sql/tests.py`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark pandas_conversion_cleanup
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20567.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20567
----
commit d87547c05c0ab874dfce8e6ddca4ee454926b664
Author: hyukjinkwon <gurwls223@...>
Date: 2018-02-09T03:40:41Z
toPandas conversion cleanup
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]