Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/20839
  
    ### Before with Python 2, missing traceback and doesn't show as 
`ImportError`
    
    ```
    In [4]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, 
int>")
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-4-ecc28a9b5e18> in <module>()
    ----> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, 
int>")
    
    /home/bryan/git/spark/python/pyspark/sql/session.py in 
createDataFrame(self, data, schema, samplingRatio, verifySchema)
        686                             "For fallback to non-optimization 
automatically, please set true to "
        687                             
"'spark.sql.execution.arrow.fallback.enabled'." % _exception_message(e))
    --> 688                         raise RuntimeError(msg)
        689             data = self._convert_from_pandas(data, schema, timezone)
        690 
    
    RuntimeError: createDataFrame attempted Arrow optimization because 
'spark.sql.execution.arrow.enabled' is set to true; however, failed by the 
reason below:
      PyArrow >= 0.8.0 must be installed; however, it was not found.
    For fallback to non-optimization automatically, please set true to 
'spark.sql.execution.arrow.fallback.enabled'.
    ```
    
    ### Before with Python 3, each time another error is raised in the catch 
block it gets chained
    
    ```
    In [2]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, 
int>")
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    ~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version()
        139     try:
    --> 140         import pyarrow
        141     except ImportError:
    
    ModuleNotFoundError: No module named 'pyarrow'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    ~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data, 
schema, samplingRatio, verifySchema)
        666                 try:
    --> 667                     return 
self._create_from_pandas_with_arrow(data, schema, timezone)
        668                 except Exception as e:
    
    ~/git/spark/python/pyspark/sql/session.py in 
_create_from_pandas_with_arrow(self, pdf, schema, timezone)
        509         require_minimum_pandas_version()
    --> 510         require_minimum_pyarrow_version()
        511 
    
    ~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version()
        142         raise ImportError("PyArrow >= %s must be installed; 
however, "
    --> 143                           "it was not found." % 
minimum_pyarrow_version)
        144     if LooseVersion(pyarrow.__version__) < 
LooseVersion(minimum_pyarrow_version):
    
    ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.
    
    During handling of the above exception, another exception occurred:
    
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-2-ecc28a9b5e18> in <module>()
    ----> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, 
int>")
    
    ~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data, 
schema, samplingRatio, verifySchema)
        686                             "For fallback to non-optimization 
automatically, please set true to "
        687                             
"'spark.sql.execution.arrow.fallback.enabled'." % _exception_message(e))
    --> 688                         raise RuntimeError(msg)
        689             data = self._convert_from_pandas(data, schema, timezone)
        690 
    
    RuntimeError: createDataFrame attempted Arrow optimization because 
'spark.sql.execution.arrow.enabled' is set to true; however, failed by the 
reason below:
      PyArrow >= 0.8.0 must be installed; however, it was not found.
    For fallback to non-optimization automatically, please set true to 
'spark.sql.execution.arrow.fallback.enabled'.
    ```
    
    ### After change with Python 2 & 3, warning is printed then error is 
re-raised
    
    ```
    In [2]: spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, 
int>")
    /home/bryan/git/spark/python/pyspark/sql/session.py:686: UserWarning: 
createDataFrame attempted Arrow optimization because 
'spark.sql.execution.arrow.enabled' is set to true; however, failed by the 
reason below:
      PyArrow >= 0.8.0 must be installed; however, it was not found.
    For fallback to non-optimization automatically, please set true to 
'spark.sql.execution.arrow.fallback.enabled'.
      warnings.warn(msg)
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-2-ecc28a9b5e18> in <module>()
    ----> 1 spark.createDataFrame(pd.DataFrame([[{u'a': 1}]]), "a: map<string, 
int>")
    
    ~/git/spark/python/pyspark/sql/session.py in createDataFrame(self, data, 
schema, samplingRatio, verifySchema)
        665                     and len(data) > 0:
        666                 try:
    --> 667                     return 
self._create_from_pandas_with_arrow(data, schema, timezone)
        668                 except Exception as e:
        669                     from pyspark.util import _exception_message
    
    ~/git/spark/python/pyspark/sql/session.py in 
_create_from_pandas_with_arrow(self, pdf, schema, timezone)
        508 
        509         require_minimum_pandas_version()
    --> 510         require_minimum_pyarrow_version()
        511 
        512         from pandas.api.types import is_datetime64_dtype, 
is_datetime64tz_dtype
    
    ~/git/spark/python/pyspark/sql/utils.py in require_minimum_pyarrow_version()
        147     if not have_arrow:
        148         raise ImportError("PyArrow >= %s must be installed; 
however, "
    --> 149                           "it was not found." % 
minimum_pyarrow_version)
        150     if LooseVersion(pyarrow.__version__) < 
LooseVersion(minimum_pyarrow_version):
        151         raise ImportError("PyArrow >= %s must be installed; 
however, "
    
    ImportError: PyArrow >= 0.8.0 must be installed; however, it was not found.
    ```
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to