Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
Yup, I also agree with adding a configuration to control this. I will work
on it for master only later.
For https://github.com/apache/spark/pull/20567#issuecomment-364994740, yup.
I agree with that but to do this, we should do something like:
```python
if # 'spark.sql.execution.arrow.enabled' true?
require_minimum_pyarrow_version()
try:
to_arrow_schema(self.schema)
# return the one with Arrow
except Exception as e:
raise Exception("'spark.sql.execution.arrow.enabled' blah blah ...")
else:
# return the one without Arrow
```
the diff and complexity is pretty similar with fallback one:
```python
if # 'spark.sql.execution.arrow.enabled' true?
should_fall_back = False
try:
require_minimum_pyarrow_version()
to_arrow_schema(self.schema)
except Exception as e:
should_fall_back = True
if not should_fall_back:
# return the one with Arrow
# return the one without Arrow
```
Note that, in case of `spark.sql.codegen.fallback`, it's `true` by default,
if I did't misunderstand. Also, we can match the behaviour to `createDataFrame`
with Pandas as input for now in the latter way.
I have been thought this feature is in transition and am trying to fix and
match the behaviour first before the release.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]