HyukjinKwon opened a new pull request, #45132:
URL: https://github.com/apache/spark/pull/45132

   ### What changes were proposed in this pull request?
   
   This PR fixes the regression introduced by 
https://github.com/apache/spark/pull/36683.
   
   ```python
   import pandas as pd
   spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
   spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", 0)
   spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", False)
   spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
   
   spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", -1)
   spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
   ```
   
   **Before**
   
   ```
   /.../spark/python/pyspark/sql/pandas/conversion.py:371: UserWarning: 
createDataFrame attempted Arrow optimization because 
'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached the 
error below and will not continue because automatic fallback with 
'spark.sql.execution.arrow.pyspark.fallback.enabled' has been set to false.
     range() arg 3 must not be zero
     warn(msg)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/.../spark/python/pyspark/sql/session.py", line 1483, in 
createDataFrame
       return super(SparkSession, self).createDataFrame(  # type: 
ignore[call-overload]
     File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 351, in 
createDataFrame
       return self._create_from_pandas_with_arrow(data, schema, timezone)
     File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 633, in 
_create_from_pandas_with_arrow
       pdf_slices = (pdf.iloc[start : start + step] for start in range(0, 
len(pdf), step))
   ValueError: range() arg 3 must not be zero
   ```
   ```
   Empty DataFrame
   Columns: [a]
   Index: []
   ```
   
   **After**
   
   ```
        a
   0  123
   ```
   
   ```
        a
   0  123
   ```
   
   ### Why are the changes needed?
   
   It fixes a regerssion. This is a documented behaviour. It should be 
backported to branch-3.4 and branch-3.5.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it fixes a regression as described above.
   
   ### How was this patch tested?
   
   Unittest was added.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to