MaxGekk opened a new pull request #26102: [SPARK-29448][SQL] Support the 
`INTERVAL` type by Parquet datasource
URL: https://github.com/apache/spark/pull/26102
 
 
   ### What changes were proposed in this pull request?
   
   Catalyst's `CalendarIntervalType` is supported in the Parquet datasource. 
Interval values are saved as parquet `INTERVAL` logical type according to the 
format specification - 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval .
   
   Parquet format allows to store intervals in millisecond precision. Because 
of this restriction, values of Spark's `INTERVAL` type have to be truncated to 
milliseconds before storing to parquet files. 
   
   ### Why are the changes needed?
   - Spark users will be able to load interval columns stored to parquet files 
in other systems
   - Datasets with interval columns can be stored to parquet files for future 
processing
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   - Add tests to `ParquetSchemaSuite` and `ParquetIOSuite`
   - by end-to-end test in `ParquetQuerySuite` which writes intervals and read 
them back
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to