Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
@gatorsmile and @rxin,
The problem here is that `toPandas` just fails on unsupported types later
and allows `BinaryType` with inconsistent conversion
(https://github.com/apache/spark/pull/20567#issuecomment-364639922) in Arrow
whereas `createDataFrame` allows fallback in both cases.
This is the last one left (for now) about PySpark/Pandas interoperability
which I found during testing out and I was thinking about targeting 2.3.0.
So, for clarification, would you be uncomfortable with one of:
1. matching both toPandas and createDataFrame to fallback with a warning
2. matching both toPandas and createDataFrame to throw an exception
3. adding a configuration to control the fallback for both
to target 2.3.0 (or 2.3.1 if the vote fails)? FYI, the current one in this
PR is 1.
If so, let me have two PRs, one for the error message for now to target
2.3.0 (or 2.3.1 if the vote fails), and one for adding a configuration to
control the fallback to target master (and maybe 2.3.1).
Does that make sense to both of you?
cc @cloud-fan too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]