Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20567
  
    https://github.com/apache/spark/pull/20567#issuecomment-364639922 case is 
actually closer to a bug as both output from one without Arrow and with Arrow 
are different and inconsistent. The problem is, that we already allow 
inconsistent conversion in `BinaryType` where we don't allow in other paths 
like `createDataFrame` and `pandas_udf`.
    
    In addition, I believe it is good to match the behaviour between `toPandas` 
and `createDataFrame` with Pandas's DataFrame as input in 2.3.0.
    
    The change is kind of safe. Actual change is basically:
    
    from
    
    ```python
    if # 'spark.sql.execution.arrow.enabled' true?
        require_minimum_pyarrow_version()
        # return the one with Arrow
    else:
        # return the one without Arrow
    ```
    
    to
    
    ```python
    if # 'spark.sql.execution.arrow.enabled' true?
        should_fall_back = False
        try:
            require_minimum_pyarrow_version()
            to_arrow_schema(self.schema)
        except Exception as e:
            should_fall_back = True
    
        if not should_fall_back:
            # return the one with Arrow
    # return the one without Arrow
    ```
    
    The error message looks already okay for now. If you feel strongly about 
this, I am fine with going ahead with this only into master.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to