[GitHub] spark pull request #20625: [SPARK-23446][PYTHON] Explicitly check supported ...

HyukjinKwon Fri, 16 Feb 2018 02:17:26 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20625#discussion_r168718112
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -2000,10 +2001,12 @@ def toPandas(self):
                         return _check_dataframe_localize_timestamps(pdf, 
timezone)
                     else:
                         return pd.DataFrame.from_records([], 
columns=self.columns)
    -            except ImportError as e:
    -                msg = "note: pyarrow must be installed and available on 
calling Python process " \
    -                      "if using spark.sql.execution.arrow.enabled=true"
    -                raise ImportError("%s\n%s" % (_exception_message(e), msg))
    +            except Exception as e:
    +                msg = (
    +                    "Note: toPandas attempted Arrow optimization because "
    +                    "'spark.sql.execution.arrow.enabled' is set to true. 
Please set it to false "
    +                    "to disable this.")
    --- End diff --
    
    Oh, that should be part of the original message. For example, I don't have 
PyArrow in `pypy` in my local. it shows the error like:
    
    ```
    RuntimeError: PyArrow >= 0.8.0 must be installed; however, it was not found.
    Note: toPandas attempted Arrow optimization because 
'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to 
disable this.
    ```




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20625: [SPARK-23446][PYTHON] Explicitly check supported ...

Reply via email to