Github user ash211 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19702#discussion_r151569727
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
---
@@ -372,23 +381,18 @@ private[parquet] class ParquetSchemaConverter(
// `TIMESTAMP_MICROS` which are both logical types annotating
`INT64`.
//
// Originally, Spark SQL uses the same nanosecond timestamp type as
Impala and Hive. Starting
- // from Spark 1.5.0, we resort to a timestamp type with 100 ns
precision so that we can store
- // a timestamp into a `Long`. This design decision is subject to
change though, for example,
- // we may resort to microsecond precision in the future.
- //
- // For Parquet, we plan to write all `TimestampType` value as
`TIMESTAMP_MICROS`, but it's
- // currently not implemented yet because parquet-mr 1.8.1 (the
version we're currently using)
- // hasn't implemented `TIMESTAMP_MICROS` yet, however it supports
TIMESTAMP_MILLIS. We will
- // encode timestamp values as TIMESTAMP_MILLIS annotating INT64 if
- // 'spark.sql.parquet.int64AsTimestampMillis' is set.
- //
- // TODO Converts `TIMESTAMP_MICROS` once parquet-mr implements that.
-
- case TimestampType if writeTimestampInMillis =>
- Types.primitive(INT64,
repetition).as(TIMESTAMP_MILLIS).named(field.name)
-
+ // from Spark 1.5.0, we resort to a timestamp type with microsecond
precision so that we can
--- End diff --
I think this comment change loses some historical context about behavior
around 1.5.0. From 1.5.0 to 2.2.0 timestamps were with 100ns precision, and
now with this change, starting in 2.3.0 they are microsecond precision, right?
The change as written seems to retroactively change the described behavior,
unless comment was wrong
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]