[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-08-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726137#comment-13726137
 ] 

Hudson commented on HIVE-4525:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #319 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/319/])
HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail 
Bautin via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537)
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java


 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.12.0

 Attachments: D10755.1.patch, D10755.2.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725332#comment-13725332
 ] 

Hudson commented on HIVE-4525:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2234 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2234/])
HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail 
Bautin via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537)
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java


 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.12.0

 Attachments: D10755.1.patch, D10755.2.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724325#comment-13724325
 ] 

Hudson commented on HIVE-4525:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/])
HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail 
Bautin via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537)
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java


 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.12.0

 Attachments: D10755.1.patch, D10755.2.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724421#comment-13724421
 ] 

Hudson commented on HIVE-4525:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/])
HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail 
Bautin via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537)
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java


 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.12.0

 Attachments: D10755.1.patch, D10755.2.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-20 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661870#comment-13661870
 ] 

Mikhail Bautin commented on HIVE-4525:
--

Test results with and without this patch differ only by a spurious failure of a 
ZK-related test that is not affected by the changes here. 

*** 3838,3843 
--- 3838,3845 
  [junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0
  [junit] Running 
org.apache.hadoop.hive.serde2.dynamic_type.TestDynamicSerDe
  [junit] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
+ [junit] Running org.apache.hadoop.hive.serde2.io.TestTimestampWritable
+ [junit] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0
  [junit] Running org.apache.hadoop.hive.serde2.lazy.TestLazyArrayMapStruct
  [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
  [junit] Running org.apache.hadoop.hive.serde2.lazy.TestLazyPrimitive
***
*** 3901,3906 
  [junit] Running org.apache.hcatalog.hbase.snapshot.TestZNodeSetUp
  [junit] Tests run: 0, Failures: 0, Errors: 2, Skipped: 0
  [junit] Running org.apache.hcatalog.hbase.snapshot.lock.WriteLockTest
! [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
  [junit] Running org.apache.hcatalog.hbase.snapshot.lock.ZNodeNameTest
  [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
--- 3903,3908 
  [junit] Running org.apache.hcatalog.hbase.snapshot.TestZNodeSetUp
  [junit] Tests run: 0, Failures: 0, Errors: 2, Skipped: 0
  [junit] Running org.apache.hcatalog.hbase.snapshot.lock.WriteLockTest
! [junit] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
  [junit] Running org.apache.hcatalog.hbase.snapshot.lock.ZNodeNameTest
  [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
+ set +x


Committers: could you please take a look and consider committing this? Cc 
[~ashutoshc], [~owen.omalley], [~cwsteinbach]. Thanks!


 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch, D10755.2.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-15 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659128#comment-13659128
 ] 

Mikhail Bautin commented on HIVE-4525:
--

I am not quite sure how to solve the backward compatibility issue in the 
writable part of {{TimestampWritable}} code ({{write}}/{{readFields}}) by 
switching to a unified nanosecond-timestamp-as-long format. If {{readFields}} 
is presented with eight bytes, would it interpret them as a four-byte int 
followed by a VInt or as a long nanosecond timestamp? Would it attempt to do 
the former and revert to the latter if there are inconsistencies? What if the 
bytes of a long nanosecond timestamp also happen to represent a valid legacy 
(int/VInt) timestamp?

In my patch, I try to maintain backward compatibility as much as possible. If a 
timestamp is in the range that can be represented by the old format, it is 
serialized using the old format. The extended format I've proposed and 
implemented for the full timestamp range builds on top of the existing one and 
can be unambiguously distinguished from the old format by examining serialized 
bytes.
In addition, the included test, {{TestTimestampWritable}}, tests both the old 
and the new (extended format), as well as double/BigDecimal conversion, 
getters/setters/constructors and everything else I could test in 
{{TimestampWritable}}.

I am sure there is a way to handle vector optimizations for timestamps in a 
backward-compatible way, and I don't think this patch would make it much more 
complicated than it already is. However, vectorized computations are a 
performance optimization, while this issue is a correctness fix. Currently, 
timestamps outside of the ~1970-2038 range would be silently corrupted in some 
queries, and this patch successfully fixes that. It is also pretty small and 
immediately available.



 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-13 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656141#comment-13656141
 ] 

Mikhail Bautin commented on HIVE-4525:
--

Correction to the design of this feature (I can't edit comments because of 
permissions, so adding another comment). In case the seconds field needs more 
than 31 bit, the first VInt is {{-1-reversedDecimal}} regardless of whether 
{{reversedDecimal}} is zero or not.

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-13 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656226#comment-13656226
 ] 

Eric Hanson commented on HIVE-4525:
---

For vectorized query execution (HIVE-4160), we are going to represent a 
timestamp value internally as a vector of 64 bit integers representing the 
number of nanos since the epoch (in 1970). Given your proposal to also support 
time values before 1970, I'd propose that for vectorized QE we extend this so a 
negative number of nanos is used to represent a value before 1970. This gives a 
range of 292 years before or after 1970, good enough for practical purposes. 
Data outside that range might first not be supported for vectorized QE, and 
then later might be supported but revert to a slower code path.

We may want to consider that the storage layer (say ORC) store timestamps 
simply as a long, so it is not as expensive to flow this data into vectorized 
query execution. With compression, these long values will compress pretty well, 
so the storage layout becomes less of a concern and query execution speed 
becomes the more pressing issue.

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-13 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656271#comment-13656271
 ] 

Mikhail Bautin commented on HIVE-4525:
--

[~ehans]: switching to long nanosecond timestamps would definitely be a much 
nicer solution, but don't you think it would break backward-compatibility for 
timestamps serialized using the old format?

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-13 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656479#comment-13656479
 ] 

Eric Hanson commented on HIVE-4525:
---

Yes, so you'd have to support both at least for an extended period of time. It 
would be a performance enhancement and you'd need to maintain backward 
compatibility for older data.

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-08 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652676#comment-13652676
 ] 

Mikhail Bautin commented on HIVE-4525:
--

h4. Design proposal

We have to be able to read the current {{TimestampWritable}}-serializable 
format for backward-compatibility, and write the format recognizable by the 
current {{TimestampWritable}} implementation for timestamps within the 
currently supported range. We can use the negative range of the {{VInt}} in the 
binary representation of the timestamp that normally represents the reversed 
decimal part to indicate the presence of an additional {{VInt}} field that 
stores the remaining bits of the {{seconds}} number (i.e. {{seconds  31}}). 
The meaning of the 7th bit of the first byte then changes from has decimal to 
has decimal or 31 bits of seconds.

The following table summarizes the four logical cases of timestamp 
serialization. The first two are backward-compatible. The second two cases are 
unsupported by the current format, so they will not be recognized by the 
current version.

|| Seconds need 31 bits || Has decimal || 7th bit of the first byte || First 
VInt || Second VInt ||
| No | No | {{0}} | N/A | N/A |
| No | Yes | {{1}} | {{reversedDecimal}} | N/A |
| Yes | No | {{1}} | {{-1}} | {{seconds  31}} |
| Yes | Yes | {{1}} | {{-2 - reversedDecimal}} | {{seconds  31}} |




 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-08 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652709#comment-13652709
 ] 

Mikhail Bautin commented on HIVE-4525:
--

Also, the binary-sortable representation of timestamps would have to change to 
accommodate additional high-order bits. If a 4-byte second-precision timestamp 
covers 68 years (or 136 if signed), by adding one most-significant byte we can 
cover 17408 (or 34816) years, which is good enough for all practical purposes.

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira