zhengruifeng opened a new pull request, #42754: URL: https://github.com/apache/spark/pull/42754
### What changes were proposed in this pull request? Move the arrow batch creation to the `isCommand` branch ### Why are the changes needed? https://github.com/apache/spark/pull/42736 and https://github.com/apache/spark/pull/42743 introduced the `CalendarIntervalType` in Spark Connect Python Client, however, there is a failure ``` spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)") ... pyspark.errors.exceptions.connect.UnsupportedOperationException: [UNSUPPORTED_DATATYPE] Unsupported data type "INTERVAL". ``` The root causes is that `handleSqlCommand` always create an arrow batch while `ArrowUtils` doesn't accept `CalendarIntervalType` now. this PR mainly focus on enabling `schema` with datatypes not compatible with arrow. In the future, we should make `ArrowUtils` accept `CalendarIntervalType` to make `collect/toPandas` works ### Does this PR introduce _any_ user-facing change? yes after this PR ``` In [1]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)") Out[1]: DataFrame[make_interval(100, 11, 1, 1, 12, 30, 1.001001): interval] In [2]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)").schema Out[2]: StructType([StructField('make_interval(100, 11, 1, 1, 12, 30, 1.001001)', CalendarIntervalType(), True)]) ``` ### How was this patch tested? enabled ut ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
