Re: [PR] [SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as TimestampNTZ [spark]

via GitHub Tue, 19 Mar 2024 12:14:36 -0700


gengliangwang commented on code in PR #45571:
URL: https://github.com/apache/spark/pull/45571#discussion_r1530949174



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala:
##########
@@ -436,6 +436,17 @@ private[parquet] class ParquetRowConverter(
           }
         }
 
+      // INT96 timestamp doesn't have a logical type, here we check the 
physical type instead.
+      case TimestampNTZType if 
parquetType.asPrimitiveType().getPrimitiveTypeName == INT96 =>
+        new ParquetPrimitiveConverter(updater) {
+          // Converts nanosecond timestamps stored as INT96.
+          // TimestampNTZ type does not require rebasing due to its lack of 
time zone context.

Review Comment:
   * LTZ doesn't really store the time zone info in Parquet files.  Also, Spark 
uses the long value directly when reading NTZ as LTZ. I am trying to make it 
simple and Symmetrical.
   * If we shift by sessional time zone here, probably we need to do it when 
reading NTZ as LTZ, which will be a breaking change. Also, the result of NTZ 
columns will be affected by the sessional time zone conf 
`spark.sql.session.timeZone`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as TimestampNTZ [spark]

Reply via email to