Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20567
  
    Yup, I also agree with adding a configuration to control this. I will work 
on it for master only later.
    
    For https://github.com/apache/spark/pull/20567#issuecomment-364994740, yup. 
I agree with that but to do this, we should do something like:
    
    ```python
    if # 'spark.sql.execution.arrow.enabled' true?
        require_minimum_pyarrow_version()
        try:
            to_arrow_schema(self.schema)
            # return the one with Arrow
        except Exception as e:
            raise Exception("'spark.sql.execution.arrow.enabled' blah blah ...")
    else:
        # return the one without Arrow
    ```
    
    the diff and complexity is pretty similar with fallback one:
    
    ```python
    if # 'spark.sql.execution.arrow.enabled' true?
        should_fall_back = False
        try:
            require_minimum_pyarrow_version()
            to_arrow_schema(self.schema)
        except Exception as e:
            should_fall_back = True
    
        if not should_fall_back:
            # return the one with Arrow
    # return the one without Arrow
    ```
    
    Note that, in case of `spark.sql.codegen.fallback`, it's `true` by default, 
if I did't misunderstand. Also, we can match the behaviour to `createDataFrame` 
with Pandas as input for now in the latter way. 
    
    I have been thought this feature is in transition and am trying to fix and 
match the behaviour first before the release.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to