[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872653#comment-16872653 ] Wes McKinney commented on PARQUET-323: -- [~liusztc09] support for nanoseconds in the INT64 type has been added https://github.com/apache/parquet-format/commit/b879065ac1bee3fe1d770eb3c4b60ab4267044d7 This is the recommended path forward > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian >Assignee: Lars Volker >Priority: Major > Fix For: format-2.5.0 > > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872626#comment-16872626 ] Songzhi Liu commented on PARQUET-323: - [~lv] Hi Lars, is it the direction of the community to deprecate INT96 support? In that case, what will be used to support nanosec precision? In some financial services use cases, support for nanosecond precision is critical. Could you please advise the potential plan of the deprecation plan? > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian >Assignee: Lars Volker >Priority: Major > Fix For: format-2.5.0 > > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410496#comment-16410496 ] ASF GitHub Bot commented on PARQUET-323: lekv closed pull request #86: PARQUET-323: Mark INT96 as deprecated URL: https://github.com/apache/parquet-format/pull/86 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/Encodings.md b/Encodings.md index 28429be7..b8905bf4 100644 --- a/Encodings.md +++ b/Encodings.md @@ -34,7 +34,7 @@ stores the data in the following format: - BOOLEAN: [Bit Packed](#RLE), LSB first - INT32: 4 bytes little endian - INT64: 8 bytes little endian - - INT96: 12 bytes little endian + - INT96: 12 bytes little endian (deprecated) - FLOAT: 4 bytes IEEE little endian - DOUBLE: 8 bytes IEEE little endian - BYTE_ARRAY: length in 4 bytes little endian followed by the bytes contained in the array diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift index 195ff908..4d2e7001 100644 --- a/src/main/thrift/parquet.thrift +++ b/src/main/thrift/parquet.thrift @@ -33,7 +33,7 @@ enum Type { BOOLEAN = 0; INT32 = 1; INT64 = 2; - INT96 = 3; + INT96 = 3; // deprecated, only used by legacy implementations. FLOAT = 4; DOUBLE = 5; BYTE_ARRAY = 6; This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian >Assignee: Lars Volker >Priority: Major > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397196#comment-16397196 ] ASF GitHub Bot commented on PARQUET-323: rdblue commented on issue #86: PARQUET-323: Mark INT96 as deprecated URL: https://github.com/apache/parquet-format/pull/86#issuecomment-372726185 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian >Assignee: Lars Volker >Priority: Major > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396315#comment-16396315 ] ASF GitHub Bot commented on PARQUET-323: lekv opened a new pull request #86: PARQUET-323: Mark INT96 as deprecated URL: https://github.com/apache/parquet-format/pull/86 Closes #49 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian >Assignee: Lars Volker >Priority: Major > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924999#comment-15924999 ] Jeff Knupp commented on PARQUET-323: There is a PR against Spark that adds support for TIMESTAMP_MILLIS (https://github.com/apache/spark/pull/15332) but after a good deal of commentary and a working patch, it has sat untouched since October. Perhaps simply asking in a Spark JIRA ticket to re-review that PR would be the quickest way for Spark to support reading int64 TIMESTAMP_MILLIS? > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904667#comment-15904667 ] Uwe L. Korn commented on PARQUET-323: - The "offending code" that blocks the usage of {{TIMESTAMP_MILLIS}} in Spark is at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L161 > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904007#comment-15904007 ] Jeff Knupp commented on PARQUET-323: I'm reasonably sure it's the only timestamp format that Spark currently supports (see https://github.com/apache/spark/pull/3820). > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902259#comment-15902259 ] Lars Volker commented on PARQUET-323: - I created [IMPALA-5049|https://issues.cloudera.org/browse/IMPALA-5049] to track the required work on Impala's side. What other projects need to transition away from INT96? Who should create JIRAs for them? > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902218#comment-15902218 ] Lars Volker commented on PARQUET-323: - We discussed this issue in today's Parquet sync and agreed to deprecate INT96. As a replacement to store timestamps (the most common use for INT96) we will encourage all projects who currently use INT96 to switch to INT64 and either use the TIMESTAMP_MILLIS or TIMESTAMP_MICROS logical types. We will not fix the ordering issues around INT96 that resulted in parquet-mr writing wrong min/max statistics. > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. > Several projects (Impala, Hive, Spark, ...) support INT96. > We need a clear spec of the replacement and the path to deprecation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406334#comment-15406334 ] Ryan Blue commented on PARQUET-323: --- +1 > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406331#comment-15406331 ] Julien Le Dem commented on PARQUET-323: --- I think we should deprecate it and discourage its use. For backward compatibility, it has to stay. https://github.com/apache/parquet-format/blob/master/LogicalTypes.md doesn't even refer to it. > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406330#comment-15406330 ] Julien Le Dem commented on PARQUET-323: --- I think we should deprecate it and discourage its use. For backward compatibility, it has to stay. https://github.com/apache/parquet-format/blob/master/LogicalTypes.md doesn't even refer to it. > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612297#comment-14612297 ] Cheng Lian commented on PARQUET-323: AFAIK JVM doesn't have native support for INT96. > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-323) INT96 should be marked as deprecated
[ https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610663#comment-14610663 ] Ryan Blue commented on PARQUET-323: --- While we don't want to encourage anyone to use int96 as a timestamp, I'm not sure if it is necessary to get rid of the type entirely. If int96 is a JVM and hardware-supported type, then it makes sense to keep it. But if everything jumps to int128 instead, then maybe we should go with that instead. Do we have enough information to make that decision yet? > INT96 should be marked as deprecated > > > Key: PARQUET-323 > URL: https://issues.apache.org/jira/browse/PARQUET-323 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian > > As discussed in the mailing list, {{INT96}} is only used to represent nanosec > timestamp in Impala for some historical reasons, and should be deprecated. > Since nanosec precision is rarely a real requirement, one possible and simple > solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or > {{INT64 (TIMESTAMP_MICROS)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)