[GitHub] [spark] awdavidson commented on pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

GitBox Tue, 15 Nov 2022 08:26:04 -0800


awdavidson commented on PR #38312:
URL: https://github.com/apache/spark/pull/38312#issuecomment-1315558808


   > @awdavidson I would like to understand the use case a bit better. Is the 
parquet file was written by an earlier Spark (version < 3.2) and does the error 
comes when that parquet file is read back with a latter Spark? If yes this is 
clearly regression. Still in this case can you please show us how we can 
reproduce it manually (a small example code for write/read)?
   > 
   > If it was written by another tool can we got an example parquet file with 
sample data where the old version works and the new version fails?
   
   @attilapiros so the parquet file is being wrote by another process. Spark 
uses this data to run aggregations and analysis over different time horizons 
where the nanosecond precision is required. Currently, when using earlier Spark 
versions (< 3.2) the `TIMESTAMP(NANOS, true)` in the parquet schema is 
automatically converted to a `LongType`, however, since the moving from parquet 
`1.10.1` to `1.12.3` and the changes to `ParquetSchemaConverter` an 
`illegalType()` is thrown. As soon as I have access this evening I will provide 
an example parquet file.
   
   Whilst I understand timestamps with nanosecond precision are not fully 
supported, this change in behaviour will prevent users from migrating to the 
latest spark version


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] awdavidson commented on pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

Reply via email to