MaxGekk opened a new pull request #26102: [SPARK-29448][SQL] Support the `INTERVAL` type by Parquet datasource URL: https://github.com/apache/spark/pull/26102 ### What changes were proposed in this pull request? Catalyst's `CalendarIntervalType` is supported in the Parquet datasource. Interval values are saved as parquet `INTERVAL` logical type according to the format specification - https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval . Parquet format allows to store intervals in millisecond precision. Because of this restriction, values of Spark's `INTERVAL` type have to be truncated to milliseconds before storing to parquet files. ### Why are the changes needed? - Spark users will be able to load interval columns stored to parquet files in other systems - Datasets with interval columns can be stored to parquet files for future processing ### Does this PR introduce any user-facing change? No ### How was this patch tested? - Add tests to `ParquetSchemaSuite` and `ParquetIOSuite` - by end-to-end test in `ParquetQuerySuite` which writes intervals and read them back
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
