Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
https://github.com/apache/spark/pull/20567#issuecomment-364639922 case is
actually closer to a bug as both output from one without Arrow and with Arrow
are different and inconsistent. The problem is, that we already allow
inconsistent conversion in `BinaryType` where we don't allow in other paths
like `createDataFrame` and `pandas_udf`.
In addition, I believe it is good to match the behaviour between `toPandas`
and `createDataFrame` with Pandas's DataFrame as input in 2.3.0.
The change is kind of safe. Actual change is basically:
from
```python
if # 'spark.sql.execution.arrow.enabled' true?
require_minimum_pyarrow_version()
# return the one with Arrow
else:
# return the one without Arrow
```
to
```python
if # 'spark.sql.execution.arrow.enabled' true?
should_fall_back = False
try:
require_minimum_pyarrow_version()
to_arrow_schema(self.schema)
except Exception as e:
should_fall_back = True
if not should_fall_back:
# return the one with Arrow
# return the one without Arrow
```
The error message looks already okay for now. If you feel strongly about
this, I am fine with going ahead with this only into master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]