Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20839 ### Before with Python 2, missing traceback and doesn't show as `ImportError` ``` In [4]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, int>") --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-4-ecc28a9b5e18> in <module>() ----> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, int>") /home/bryan/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data, schema, samplingRatio, verifySchema) 686 "For fallback to non-optimization automatically, please set true to " 687 "'spark.sql.execution.arrow.fallback.enabled'." % _exception_message(e)) --> 688 raise RuntimeError(msg) 689 data = self._convert_from_pandas(data, schema, timezone) 690 RuntimeError: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below: PyArrow >= 0.8.0 must be installed; however, it was not found. For fallback to non-optimization automatically, please set true to 'spark.sql.execution.arrow.fallback.enabled'. ``` ### Before with Python 3, each time another error is raised in the catch block it gets chained ``` In [2]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, int>") --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) ~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version() 139 try: --> 140 import pyarrow 141 except ImportError: ModuleNotFoundError: No module named 'pyarrow' During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) ~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data, schema, samplingRatio, verifySchema) 666 try: --> 667 return self._create_from_pandas_with_arrow(data, schema, timezone) 668 except Exception as e: ~/git/spark/python/pyspark/sql/session.py in _create_from_pandas_with_arrow(self, pdf, schema, timezone) 509 require_minimum_pandas_version() --> 510 require_minimum_pyarrow_version() 511 ~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version() 142 raise ImportError("PyArrow >= %s must be installed; however, " --> 143 "it was not found." % minimum_pyarrow_version) 144 if LooseVersion(pyarrow.__version__) < LooseVersion(minimum_pyarrow_version): ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found. During handling of the above exception, another exception occurred: RuntimeError Traceback (most recent call last) <ipython-input-2-ecc28a9b5e18> in <module>() ----> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, int>") ~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data, schema, samplingRatio, verifySchema) 686 "For fallback to non-optimization automatically, please set true to " 687 "'spark.sql.execution.arrow.fallback.enabled'." % _exception_message(e)) --> 688 raise RuntimeError(msg) 689 data = self._convert_from_pandas(data, schema, timezone) 690 RuntimeError: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below: PyArrow >= 0.8.0 must be installed; however, it was not found. For fallback to non-optimization automatically, please set true to 'spark.sql.execution.arrow.fallback.enabled'. ``` ### After change with Python 2 & 3, warning is printed then error is re-raised ``` In [2]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, int>") /home/bryan/git/spark/python/pyspark/sql/session.py:686: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below: PyArrow >= 0.8.0 must be installed; however, it was not found. For fallback to non-optimization automatically, please set true to 'spark.sql.execution.arrow.fallback.enabled'. warnings.warn(msg) --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-2-ecc28a9b5e18> in <module>() ----> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, int>") ~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data, schema, samplingRatio, verifySchema) 665 and len(data) > 0: 666 try: --> 667 return self._create_from_pandas_with_arrow(data, schema, timezone) 668 except Exception as e: 669 from pyspark.util import _exception_message ~/git/spark/python/pyspark/sql/session.py in _create_from_pandas_with_arrow(self, pdf, schema, timezone) 508 509 require_minimum_pandas_version() --> 510 require_minimum_pyarrow_version() 511 512 from pandas.api.types import is_datetime64_dtype, is_datetime64tz_dtype ~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version() 147 if not have_arrow: 148 raise ImportError("PyArrow >= %s must be installed; however, " --> 149 "it was not found." % minimum_pyarrow_version) 150 if LooseVersion(pyarrow.__version__) < LooseVersion(minimum_pyarrow_version): 151 raise ImportError("PyArrow >= %s must be installed; however, " ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found. ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org