[GitHub] spark pull request #19702: [SPARK-10365][SQL] Support Parquet logical type T...

ash211 Thu, 16 Nov 2017 15:47:12 -0800

Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19702#discussion_r151569727
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
 ---
    @@ -372,23 +381,18 @@ private[parquet] class ParquetSchemaConverter(
           // `TIMESTAMP_MICROS` which are both logical types annotating 
`INT64`.
           //
           // Originally, Spark SQL uses the same nanosecond timestamp type as 
Impala and Hive.  Starting
    -      // from Spark 1.5.0, we resort to a timestamp type with 100 ns 
precision so that we can store
    -      // a timestamp into a `Long`.  This design decision is subject to 
change though, for example,
    -      // we may resort to microsecond precision in the future.
    -      //
    -      // For Parquet, we plan to write all `TimestampType` value as 
`TIMESTAMP_MICROS`, but it's
    -      // currently not implemented yet because parquet-mr 1.8.1 (the 
version we're currently using)
    -      // hasn't implemented `TIMESTAMP_MICROS` yet, however it supports 
TIMESTAMP_MILLIS. We will
    -      // encode timestamp values as TIMESTAMP_MILLIS annotating INT64 if
    -      // 'spark.sql.parquet.int64AsTimestampMillis' is set.
    -      //
    -      // TODO Converts `TIMESTAMP_MICROS` once parquet-mr implements that.
    -
    -      case TimestampType if writeTimestampInMillis =>
    -        Types.primitive(INT64, 
repetition).as(TIMESTAMP_MILLIS).named(field.name)
    -
    +      // from Spark 1.5.0, we resort to a timestamp type with microsecond 
precision so that we can
    --- End diff --
    
    I think this comment change loses some historical context about behavior 
around 1.5.0.  From 1.5.0 to 2.2.0 timestamps were with 100ns precision, and 
now with this change, starting in 2.3.0 they are microsecond precision, right?
    
    The change as written seems to retroactively change the described behavior, 
unless comment was wrong



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19702: [SPARK-10365][SQL] Support Parquet logical type T...

Reply via email to