Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20839
### Before with Python 2, missing traceback and doesn't show as
`ImportError`
```
In [4]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string,
int>")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-4-ecc28a9b5e18> in <module>()
----> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string,
int>")
/home/bryan/git/spark/python/pyspark/sql/session.py in
createDataFrame(self, data, schema, samplingRatio, verifySchema)
686 "For fallback to non-optimization
automatically, please set true to "
687
"'spark.sql.execution.arrow.fallback.enabled'." % _exception_message(e))
--> 688 raise RuntimeError(msg)
689 data = self._convert_from_pandas(data, schema, timezone)
690
RuntimeError: createDataFrame attempted Arrow optimization because
'spark.sql.execution.arrow.enabled' is set to true; however, failed by the
reason below:
PyArrow >= 0.8.0 must be installed; however, it was not found.
For fallback to non-optimization automatically, please set true to
'spark.sql.execution.arrow.fallback.enabled'.
```
### Before with Python 3, each time another error is raised in the catch
block it gets chained
```
In [2]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string,
int>")
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version()
139 try:
--> 140 import pyarrow
141 except ImportError:
ModuleNotFoundError: No module named 'pyarrow'
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data,
schema, samplingRatio, verifySchema)
666 try:
--> 667 return
self._create_from_pandas_with_arrow(data, schema, timezone)
668 except Exception as e:
~/git/spark/python/pyspark/sql/session.py in
_create_from_pandas_with_arrow(self, pdf, schema, timezone)
509 require_minimum_pandas_version()
--> 510 require_minimum_pyarrow_version()
511
~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version()
142 raise ImportError("PyArrow >= %s must be installed;
however, "
--> 143 "it was not found." %
minimum_pyarrow_version)
144 if LooseVersion(pyarrow.__version__) <
LooseVersion(minimum_pyarrow_version):
ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-2-ecc28a9b5e18> in <module>()
----> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string,
int>")
~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data,
schema, samplingRatio, verifySchema)
686 "For fallback to non-optimization
automatically, please set true to "
687
"'spark.sql.execution.arrow.fallback.enabled'." % _exception_message(e))
--> 688 raise RuntimeError(msg)
689 data = self._convert_from_pandas(data, schema, timezone)
690
RuntimeError: createDataFrame attempted Arrow optimization because
'spark.sql.execution.arrow.enabled' is set to true; however, failed by the
reason below:
PyArrow >= 0.8.0 must be installed; however, it was not found.
For fallback to non-optimization automatically, please set true to
'spark.sql.execution.arrow.fallback.enabled'.
```
### After change with Python 2 & 3, warning is printed then error is
re-raised
```
In [2]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string,
int>")
/home/bryan/git/spark/python/pyspark/sql/session.py:686: UserWarning:
createDataFrame attempted Arrow optimization because
'spark.sql.execution.arrow.enabled' is set to true; however, failed by the
reason below:
PyArrow >= 0.8.0 must be installed; however, it was not found.
For fallback to non-optimization automatically, please set true to
'spark.sql.execution.arrow.fallback.enabled'.
warnings.warn(msg)
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-2-ecc28a9b5e18> in <module>()
----> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string,
int>")
~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data,
schema, samplingRatio, verifySchema)
665 and len(data) > 0:
666 try:
--> 667 return
self._create_from_pandas_with_arrow(data, schema, timezone)
668 except Exception as e:
669 from pyspark.util import _exception_message
~/git/spark/python/pyspark/sql/session.py in
_create_from_pandas_with_arrow(self, pdf, schema, timezone)
508
509 require_minimum_pandas_version()
--> 510 require_minimum_pyarrow_version()
511
512 from pandas.api.types import is_datetime64_dtype,
is_datetime64tz_dtype
~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version()
147 if not have_arrow:
148 raise ImportError("PyArrow >= %s must be installed;
however, "
--> 149 "it was not found." %
minimum_pyarrow_version)
150 if LooseVersion(pyarrow.__version__) <
LooseVersion(minimum_pyarrow_version):
151 raise ImportError("PyArrow >= %s must be installed;
however, "
ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]