awdavidson commented on PR #38312: URL: https://github.com/apache/spark/pull/38312#issuecomment-1315558808
> @awdavidson I would like to understand the use case a bit better. Is the parquet file was written by an earlier Spark (version < 3.2) and does the error comes when that parquet file is read back with a latter Spark? If yes this is clearly regression. Still in this case can you please show us how we can reproduce it manually (a small example code for write/read)? > > If it was written by another tool can we got an example parquet file with sample data where the old version works and the new version fails? @attilapiros so the parquet file is being wrote by another process. Spark uses this data to run aggregations and analysis over different time horizons where the nanosecond precision is required. Currently, when using earlier Spark versions (< 3.2) the `TIMESTAMP(NANOS, true)` in the parquet schema is automatically converted to a `LongType`, however, since the moving from parquet `1.10.1` to `1.12.3` and the changes to `ParquetSchemaConverter` an `illegalType()` is thrown. As soon as I have access this evening I will provide an example parquet file. Whilst I understand timestamps with nanosecond precision are not fully supported, this change in behaviour will prevent users from migrating to the latest spark version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
