[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726137#comment-13726137 ] Hudson commented on HIVE-4525: -- ABORTED: Integrated in Hive-trunk-hadoop2 #319 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/319/]) HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail Bautin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537) * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.12.0 Attachments: D10755.1.patch, D10755.2.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725332#comment-13725332 ] Hudson commented on HIVE-4525: -- FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2234/]) HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail Bautin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537) * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.12.0 Attachments: D10755.1.patch, D10755.2.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724325#comment-13724325 ] Hudson commented on HIVE-4525: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/]) HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail Bautin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537) * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.12.0 Attachments: D10755.1.patch, D10755.2.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724421#comment-13724421 ] Hudson commented on HIVE-4525: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/]) HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail Bautin via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537) * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.12.0 Attachments: D10755.1.patch, D10755.2.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661870#comment-13661870 ] Mikhail Bautin commented on HIVE-4525: -- Test results with and without this patch differ only by a spurious failure of a ZK-related test that is not affected by the changes here. *** 3838,3843 --- 3838,3845 [junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0 [junit] Running org.apache.hadoop.hive.serde2.dynamic_type.TestDynamicSerDe [junit] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0 + [junit] Running org.apache.hadoop.hive.serde2.io.TestTimestampWritable + [junit] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0 [junit] Running org.apache.hadoop.hive.serde2.lazy.TestLazyArrayMapStruct [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0 [junit] Running org.apache.hadoop.hive.serde2.lazy.TestLazyPrimitive *** *** 3901,3906 [junit] Running org.apache.hcatalog.hbase.snapshot.TestZNodeSetUp [junit] Tests run: 0, Failures: 0, Errors: 2, Skipped: 0 [junit] Running org.apache.hcatalog.hbase.snapshot.lock.WriteLockTest ! [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 [junit] Running org.apache.hcatalog.hbase.snapshot.lock.ZNodeNameTest [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0 --- 3903,3908 [junit] Running org.apache.hcatalog.hbase.snapshot.TestZNodeSetUp [junit] Tests run: 0, Failures: 0, Errors: 2, Skipped: 0 [junit] Running org.apache.hcatalog.hbase.snapshot.lock.WriteLockTest ! [junit] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0 [junit] Running org.apache.hcatalog.hbase.snapshot.lock.ZNodeNameTest [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0 + set +x Committers: could you please take a look and consider committing this? Cc [~ashutoshc], [~owen.omalley], [~cwsteinbach]. Thanks! Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch, D10755.2.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659128#comment-13659128 ] Mikhail Bautin commented on HIVE-4525: -- I am not quite sure how to solve the backward compatibility issue in the writable part of {{TimestampWritable}} code ({{write}}/{{readFields}}) by switching to a unified nanosecond-timestamp-as-long format. If {{readFields}} is presented with eight bytes, would it interpret them as a four-byte int followed by a VInt or as a long nanosecond timestamp? Would it attempt to do the former and revert to the latter if there are inconsistencies? What if the bytes of a long nanosecond timestamp also happen to represent a valid legacy (int/VInt) timestamp? In my patch, I try to maintain backward compatibility as much as possible. If a timestamp is in the range that can be represented by the old format, it is serialized using the old format. The extended format I've proposed and implemented for the full timestamp range builds on top of the existing one and can be unambiguously distinguished from the old format by examining serialized bytes. In addition, the included test, {{TestTimestampWritable}}, tests both the old and the new (extended format), as well as double/BigDecimal conversion, getters/setters/constructors and everything else I could test in {{TimestampWritable}}. I am sure there is a way to handle vector optimizations for timestamps in a backward-compatible way, and I don't think this patch would make it much more complicated than it already is. However, vectorized computations are a performance optimization, while this issue is a correctness fix. Currently, timestamps outside of the ~1970-2038 range would be silently corrupted in some queries, and this patch successfully fixes that. It is also pretty small and immediately available. Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656141#comment-13656141 ] Mikhail Bautin commented on HIVE-4525: -- Correction to the design of this feature (I can't edit comments because of permissions, so adding another comment). In case the seconds field needs more than 31 bit, the first VInt is {{-1-reversedDecimal}} regardless of whether {{reversedDecimal}} is zero or not. Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656226#comment-13656226 ] Eric Hanson commented on HIVE-4525: --- For vectorized query execution (HIVE-4160), we are going to represent a timestamp value internally as a vector of 64 bit integers representing the number of nanos since the epoch (in 1970). Given your proposal to also support time values before 1970, I'd propose that for vectorized QE we extend this so a negative number of nanos is used to represent a value before 1970. This gives a range of 292 years before or after 1970, good enough for practical purposes. Data outside that range might first not be supported for vectorized QE, and then later might be supported but revert to a slower code path. We may want to consider that the storage layer (say ORC) store timestamps simply as a long, so it is not as expensive to flow this data into vectorized query execution. With compression, these long values will compress pretty well, so the storage layout becomes less of a concern and query execution speed becomes the more pressing issue. Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656271#comment-13656271 ] Mikhail Bautin commented on HIVE-4525: -- [~ehans]: switching to long nanosecond timestamps would definitely be a much nicer solution, but don't you think it would break backward-compatibility for timestamps serialized using the old format? Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656479#comment-13656479 ] Eric Hanson commented on HIVE-4525: --- Yes, so you'd have to support both at least for an extended period of time. It would be a performance enhancement and you'd need to maintain backward compatibility for older data. Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D10755.1.patch TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652676#comment-13652676 ] Mikhail Bautin commented on HIVE-4525: -- h4. Design proposal We have to be able to read the current {{TimestampWritable}}-serializable format for backward-compatibility, and write the format recognizable by the current {{TimestampWritable}} implementation for timestamps within the currently supported range. We can use the negative range of the {{VInt}} in the binary representation of the timestamp that normally represents the reversed decimal part to indicate the presence of an additional {{VInt}} field that stores the remaining bits of the {{seconds}} number (i.e. {{seconds 31}}). The meaning of the 7th bit of the first byte then changes from has decimal to has decimal or 31 bits of seconds. The following table summarizes the four logical cases of timestamp serialization. The first two are backward-compatible. The second two cases are unsupported by the current format, so they will not be recognized by the current version. || Seconds need 31 bits || Has decimal || 7th bit of the first byte || First VInt || Second VInt || | No | No | {{0}} | N/A | N/A | | No | Yes | {{1}} | {{reversedDecimal}} | N/A | | Yes | No | {{1}} | {{-1}} | {{seconds 31}} | | Yes | Yes | {{1}} | {{-2 - reversedDecimal}} | {{seconds 31}} | Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
[ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652709#comment-13652709 ] Mikhail Bautin commented on HIVE-4525: -- Also, the binary-sortable representation of timestamps would have to change to accommodate additional high-order bits. If a 4-byte second-precision timestamp covers 68 years (or 136 if signed), by adding one most-significant byte we can cover 17408 (or 34816) years, which is good enough for all practical purposes. Support timestamps earlier than 1970 and later than 2038 Key: HIVE-4525 URL: https://issues.apache.org/jira/browse/HIVE-4525 Project: Hive Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin TimestampWritable currently serializes timestamps using the lower 31 bits of an int. This does not allow to store timestamps earlier than 1970 or later than a certain point in 2038. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira