awdavidson commented on code in PR #38312:
URL: https://github.com/apache/spark/pull/38312#discussion_r1001477809
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:
##########
@@ -271,6 +271,8 @@ class ParquetToSparkSchemaConverter(
} else {
TimestampNTZType
}
+ case timestamp: TimestampLogicalTypeAnnotation if timestamp.getUnit
== TimeUnit.NANOS =>
Review Comment:
@EnricoMi so it is possible to use `spark.read.schema(..)` as a workaround,
however, you end up loosing functionality like `mergeSchema` which will
automatically handle schema evolution etc and potentially will need to know the
entire schema up front if all columns are required. Also for other
consumers/users, especially in the exploratory analysis space, it will require
them to have a better understanding of the underlying data structure before
they can use it and this gets more difficult when the file is extremely wide.
I can imagine developers creating otherways to avoid the nuisance; which
seems a bit crazy considering that functionality already exists.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]