[jira] [Commented] (SPARK-36182) Support TimestampNTZ type in Parquet file source
[ https://issues.apache.org/jira/browse/SPARK-36182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17439268#comment-17439268 ] Apache Spark commented on SPARK-36182: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/34495 > Support TimestampNTZ type in Parquet file source > > > Key: SPARK-36182 > URL: https://issues.apache.org/jira/browse/SPARK-36182 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.3.0 > > > As per > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp, > Parquet supports both TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current > default timestamp type): > * A TIMESTAMP with isAdjustedToUTC=true => TIMESTAMP_LTZ > * A TIMESTAMP with isAdjustedToUTC=false => TIMESTAMP_NTZ > In Spark 3.1 or prior, the Parquet writer follows the definition and sets > the field `isAdjustedToUTC` as `true`, while the Parquet reader doesn’t > respect the `isAdjustedToUTC` flag and convert any Parquet Timestamp type as > TIMESTAMP_LTZ. > Since 3.2, with the support of timestamp without time zone type: > * Parquet writer follows the definition and sets the field `isAdjustedToUTC` > as `false` on writing TIMESTAMP_NTZ. > * Parquet reader > ** For schema inference, Spark converts the Parquet timestamp type to the > corresponding catalyst timestamp type according to the timestamp annotation > flag `isAdjustedToUTC`. > ** If merge schema is enabled in schema inference and some of the files are > inferred as TIMESTAMP_NTZ while the others are TIMESTAMP_LTZ, the result type > is TIMESTAMP_LTZ which is considered as the “wider” type > ** If a column of a user-provided schema is TIMESTAMP_LTZ and the column was > written as TIMESTAMP_NTZ type, Spark allows the read operation. > ** If a column of a user-provided schema is TIMESTAMP_NTZ and the column was > written as TIMESTAMP_LTZ type, the read operation is not allowed since the > TIMESTAMP_NTZ is considered as narrower than TIMESTAMP_LTZ. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36182) Support TimestampNTZ type in Parquet file source
[ https://issues.apache.org/jira/browse/SPARK-36182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17439266#comment-17439266 ] Apache Spark commented on SPARK-36182: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/34495 > Support TimestampNTZ type in Parquet file source > > > Key: SPARK-36182 > URL: https://issues.apache.org/jira/browse/SPARK-36182 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.3.0 > > > As per > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp, > Parquet supports both TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current > default timestamp type): > * A TIMESTAMP with isAdjustedToUTC=true => TIMESTAMP_LTZ > * A TIMESTAMP with isAdjustedToUTC=false => TIMESTAMP_NTZ > In Spark 3.1 or prior, the Parquet writer follows the definition and sets > the field `isAdjustedToUTC` as `true`, while the Parquet reader doesn’t > respect the `isAdjustedToUTC` flag and convert any Parquet Timestamp type as > TIMESTAMP_LTZ. > Since 3.2, with the support of timestamp without time zone type: > * Parquet writer follows the definition and sets the field `isAdjustedToUTC` > as `false` on writing TIMESTAMP_NTZ. > * Parquet reader > ** For schema inference, Spark converts the Parquet timestamp type to the > corresponding catalyst timestamp type according to the timestamp annotation > flag `isAdjustedToUTC`. > ** If merge schema is enabled in schema inference and some of the files are > inferred as TIMESTAMP_NTZ while the others are TIMESTAMP_LTZ, the result type > is TIMESTAMP_LTZ which is considered as the “wider” type > ** If a column of a user-provided schema is TIMESTAMP_LTZ and the column was > written as TIMESTAMP_NTZ type, Spark allows the read operation. > ** If a column of a user-provided schema is TIMESTAMP_NTZ and the column was > written as TIMESTAMP_LTZ type, the read operation is not allowed since the > TIMESTAMP_NTZ is considered as narrower than TIMESTAMP_LTZ. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36182) Support TimestampNTZ type in Parquet file source
[ https://issues.apache.org/jira/browse/SPARK-36182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382126#comment-17382126 ] Apache Spark commented on SPARK-36182: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33395 > Support TimestampNTZ type in Parquet file source > > > Key: SPARK-36182 > URL: https://issues.apache.org/jira/browse/SPARK-36182 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > As per > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp, > Parquet supports both TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current > default timestamp type): > * A TIMESTAMP with isAdjustedToUTC=true => TIMESTAMP_LTZ > * A TIMESTAMP with isAdjustedToUTC=false => TIMESTAMP_NTZ > In Spark 3.1 or prior, the Parquet writer follows the definition and sets > the field `isAdjustedToUTC` as `true`, while the Parquet reader doesn’t > respect the `isAdjustedToUTC` flag and convert any Parquet Timestamp type as > TIMESTAMP_LTZ. > Since 3.2, with the support of timestamp without time zone type: > * Parquet writer follows the definition and sets the field `isAdjustedToUTC` > as `false` on writing TIMESTAMP_NTZ. > * Parquet reader > ** For schema inference, Spark converts the Parquet timestamp type to the > corresponding catalyst timestamp type according to the timestamp annotation > flag `isAdjustedToUTC`. > ** If merge schema is enabled in schema inference and some of the files are > inferred as TIMESTAMP_NTZ while the others are TIMESTAMP_LTZ, the result type > is TIMESTAMP_LTZ which is considered as the “wider” type > ** If a column of a user-provided schema is TIMESTAMP_LTZ and the column was > written as TIMESTAMP_NTZ type, Spark allows the read operation. > ** If a column of a user-provided schema is TIMESTAMP_NTZ and the column was > written as TIMESTAMP_LTZ type, the read operation is not allowed since the > TIMESTAMP_NTZ is considered as narrower than TIMESTAMP_LTZ. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org