[GitHub] [spark] zhengruifeng opened a new pull request, #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow

via GitHub Thu, 31 Aug 2023 06:30:37 -0700


zhengruifeng opened a new pull request, #42754:
URL: https://github.com/apache/spark/pull/42754


   ### What changes were proposed in this pull request?
   
   Move the arrow batch creation to the `isCommand` branch
   
   
   ### Why are the changes needed?
   
   https://github.com/apache/spark/pull/42736 and 
https://github.com/apache/spark/pull/42743 introduced the 
`CalendarIntervalType` in Spark Connect Python Client, however, there is a 
failure
   
   ```
   spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)")
   
   ...
   
   pyspark.errors.exceptions.connect.UnsupportedOperationException: 
[UNSUPPORTED_DATATYPE] Unsupported data type "INTERVAL".
   ```
   
   The root causes is that `handleSqlCommand` always create an arrow batch 
while `ArrowUtils` doesn't accept `CalendarIntervalType` now.
   
   this PR mainly focus on enabling `schema` with datatypes not compatible with 
arrow.
   In the future, we should make `ArrowUtils` accept `CalendarIntervalType` to 
make `collect/toPandas` works
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   after this PR
   ```
   In [1]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)")
   Out[1]: DataFrame[make_interval(100, 11, 1, 1, 12, 30, 1.001001): interval]
   
   In [2]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 
01.001001)").schema
   Out[2]: StructType([StructField('make_interval(100, 11, 1, 1, 12, 30, 
1.001001)', CalendarIntervalType(), True)])
   ```
   
   
   ### How was this patch tested?
   enabled ut
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng opened a new pull request, #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow

Reply via email to