[GitHub] [spark] awdavidson commented on a diff in pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

GitBox Fri, 21 Oct 2022 00:49:49 -0700


awdavidson commented on code in PR #38312:
URL: https://github.com/apache/spark/pull/38312#discussion_r1001477809



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:
##########
@@ -271,6 +271,8 @@ class ParquetToSparkSchemaConverter(
             } else {
               TimestampNTZType
             }
+          case timestamp: TimestampLogicalTypeAnnotation if timestamp.getUnit 
== TimeUnit.NANOS =>

Review Comment:
   @EnricoMi so it is possible to use `spark.read.schema(..)` as a workaround, 
however, you end up loosing functionality like `mergeSchema` which will 
automatically handle schema evolution etc and potentially will need to know the 
entire schema up front if all columns are required. Also for other 
consumers/users, especially in the exploratory analysis space, it will require 
them to have a better understanding of the underlying data structure before 
they can use it and this gets more difficult when the file is extremely wide.
   
   I can imagine developers creating otherways to avoid the nuisance; which 
seems a bit crazy considering that functionality already exists.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] awdavidson commented on a diff in pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

Reply via email to