Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20625#discussion_r168718112
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -2000,10 +2001,12 @@ def toPandas(self):
return _check_dataframe_localize_timestamps(pdf,
timezone)
else:
return pd.DataFrame.from_records([],
columns=self.columns)
- except ImportError as e:
- msg = "note: pyarrow must be installed and available on
calling Python process " \
- "if using spark.sql.execution.arrow.enabled=true"
- raise ImportError("%s\n%s" % (_exception_message(e), msg))
+ except Exception as e:
+ msg = (
+ "Note: toPandas attempted Arrow optimization because "
+ "'spark.sql.execution.arrow.enabled' is set to true.
Please set it to false "
+ "to disable this.")
--- End diff --
Oh, that should be part of the original message. For example, I don't have
PyArrow in `pypy` in my local. it shows the error like:
```
RuntimeError: PyArrow >= 0.8.0 must be installed; however, it was not found.
Note: toPandas attempted Arrow optimization because
'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to
disable this.
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]