Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19702#discussion_r151570606
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
---
@@ -372,23 +381,18 @@ private[parquet] class ParquetSchemaConverter(
// `TIMESTAMP_MICROS` which are both logical types annotating
`INT64`.
//
// Originally, Spark SQL uses the same nanosecond timestamp type as
Impala and Hive. Starting
- // from Spark 1.5.0, we resort to a timestamp type with 100 ns
precision so that we can store
- // a timestamp into a `Long`. This design decision is subject to
change though, for example,
- // we may resort to microsecond precision in the future.
- //
- // For Parquet, we plan to write all `TimestampType` value as
`TIMESTAMP_MICROS`, but it's
- // currently not implemented yet because parquet-mr 1.8.1 (the
version we're currently using)
- // hasn't implemented `TIMESTAMP_MICROS` yet, however it supports
TIMESTAMP_MILLIS. We will
- // encode timestamp values as TIMESTAMP_MILLIS annotating INT64 if
- // 'spark.sql.parquet.int64AsTimestampMillis' is set.
- //
- // TODO Converts `TIMESTAMP_MICROS` once parquet-mr implements that.
-
- case TimestampType if writeTimestampInMillis =>
- Types.primitive(INT64,
repetition).as(TIMESTAMP_MILLIS).named(field.name)
-
+ // from Spark 1.5.0, we resort to a timestamp type with microsecond
precision so that we can
--- End diff --
I don't think Spark ever have 100ns precision, cc @liancheng to confirm.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]