[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2014-08-09 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091908#comment-14091908
 ] 

Lefty Leverenz commented on HIVE-4123:
--

Done, thanks [~prasanth_j].  Now the description for 
*hive.exec.orc.write.format* says:

{quote}
Define the version of the file to write. Possible values are 0.11 and 0.12. If 
this parameter is not defined, ORC will use the run length encoding (RLE) 
introduced in Hive 0.12. Any value other than 0.11 results in the 0.12 encoding.

Additional values may be introduced in the future (see HIVE-6002).
{quote}

HIVE-6586 (for HiveConf.java updates) has a comment about the new description.

* [Configuration Properties -- hive.exec.orc.write.format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.write.format]
* [HIVE-6586 comment about new description for hive.exec.orc.write.format | 
https://issues.apache.org/jira/browse/HIVE-6586?focusedCommentId=14091905page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14091905]

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: TODOC12, orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123-8.patch, HIVE-4123.1.git.patch.txt, 
 HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, 
 HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2014-08-08 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090350#comment-14090350
 ] 

Prasanth J commented on HIVE-4123:
--

Please go ahead and update the original description. 
At this point the only possible valid values are 0.11 and 0.12. As you had 
mentioned if the parameter is not defined or defined wrongly it will use the 
default 0.12 encoding. 

bq. Is that accurate? Can releases be specified as 0.12.0 or 0.13.1?
Yes. Accurate. HIVE-6002 was trying to add patch number to the write version so 
that numbers can be specified as 0.12.1. But I don't think it will be committed 
until next major change to ORC writer.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: TODOC12, orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123-8.patch, HIVE-4123.1.git.patch.txt, 
 HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, 
 HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2014-08-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089866#comment-14089866
 ] 

Lefty Leverenz commented on HIVE-4123:
--

Doc note:  This added configuration parameter *hive.exec.orc.write.format* with 
a default value of 0.11, which was changed to null by HIVE-5091 before the 
release.

*hive.exec.orc.write.format* is documented in the wiki here:

* [Configuration Properties -- hive.exec.orc.write.format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.write.format]

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123-8.patch, HIVE-4123.1.git.patch.txt, 
 HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, 
 HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2014-08-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090193#comment-14090193
 ] 

Lefty Leverenz commented on HIVE-4123:
--

Doc questions:  Would it be okay to restore part of the original description 
for *hive.exec.orc.write.format* in the wiki (and later in HiveConf.java)?

* current description is just Define the version of the file to write -- that 
doesn't give any idea about possible values, since the default is null, and it 
isn't clear that version of the file means Hive version
* original description was use 0.11 version of RLE encoding. if this conf is 
not defined or any other value specified, ORC will use the new RLE encoding

So I'd like to add Possible values are 0.11, 0.12, etc.  If this parameter is 
not defined, ORC will use the RLE encoding introduced in Hive 0.12.  Any value 
other than 0.11 results in the 0.12 encoding.

Is that accurate?  Can releases be specified as 0.12.0 or 0.13.1?



 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: TODOC12, orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123-8.patch, HIVE-4123.1.git.patch.txt, 
 HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, 
 HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737891#comment-13737891
 ] 

Hudson commented on HIVE-4123:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #354 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/354/])
HIVE-4123 Improved ORC integer RLE version 2. (Prasanth Jayachandran via 
omalley) (omalley: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513155)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* 
/hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerReader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.orig
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736773#comment-13736773
 ] 

Hive QA commented on HIVE-4123:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597402/HIVE-4123.patch.txt

{color:green}SUCCESS:{color} +1 2848 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/400/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/400/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736931#comment-13736931
 ] 

Brock Noland commented on HIVE-4123:


[~owen.omalley] looks like your comment was accidently put in the Release 
Notes section.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737060#comment-13737060
 ] 

Prasanth J commented on HIVE-4123:
--

Thanks [~owen.omalley]for committing the patch!

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737282#comment-13737282
 ] 

Hudson commented on HIVE-4123:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #124 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/124/])
HIVE-4123 Improved ORC integer RLE version 2. (Prasanth Jayachandran via 
omalley) (omalley: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513155)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* 
/hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerReader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.orig
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737398#comment-13737398
 ] 

Hudson commented on HIVE-4123:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #55 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/55/])
HIVE-4123 Improved ORC integer RLE version 2. (Prasanth Jayachandran via 
omalley) (omalley: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513155)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* 
/hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerReader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.orig
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737695#comment-13737695
 ] 

Hudson commented on HIVE-4123:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2263 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2263/])
HIVE-4123 Improved ORC integer RLE version 2. (Prasanth Jayachandran via 
omalley) (omalley: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513155)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* 
/hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerReader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.orig
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-10 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736097#comment-13736097
 ] 

Owen O'Malley commented on HIVE-4123:
-

+1, it looks good to me.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123.8.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-09 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735414#comment-13735414
 ] 

Owen O'Malley commented on HIVE-4123:
-

Thanks, Prasanth! This is looking good. I can't find any callers for 
WriterImpl.getWriteFormat. Is that dead code?

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-09 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735473#comment-13735473
 ] 

Prasanth J commented on HIVE-4123:
--

Yeah. Its not used anywhere. Sorry I forgot to remove that. I removed that 
method in this new patch.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123.8.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-08 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734231#comment-13734231
 ] 

Prasanth J commented on HIVE-4123:
--

Thanks for the review Owen.

I have addressed the following issues with this patch
- Date type handled for new encoding
- Better encoding check added by overriding checkEncoding() for valid types
- Created factories for reader and writer creation
- Indentation fix
- DIRECT_V2 encoding can be turned on/off by using hive.exec.orc.write.format 
configuration parameter. If this parameter value is 0.11 then old RLE 
encoding will be used else if undefined or for any other values new RLE 
encoding will be used.

Also, HIVE-4324 patch will get affected by this patch. So this new patch is 
generated on top of HIVE-4324.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731794#comment-13731794
 ] 

Prasanth J commented on HIVE-4123:
--

Code comment improvement/fixes, removed some redundant code, long repeat runs 
will directly use DELTA encoding instead of calling determineEncoding() 
function and few more changes added.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732274#comment-13732274
 ] 

Eric Hanson commented on HIVE-4123:
---

This is a great addition. Are you going to update the vectorized reader as well 
to read the updated format?

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732578#comment-13732578
 ] 

Prasanth J commented on HIVE-4123:
--

[~ehans]Sure. I can take a look at changes required for vectorized reader to 
read from this new encodings.  

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732806#comment-13732806
 ] 

Prasanth J commented on HIVE-4123:
--

Updated the excel sheet. The excel sheet shows the comparison of existing RLE 
(baseline) vs the new RLE. The latest patch after code review shows better 
compression ratio when compared to old patch as well as the existing RLE. I 
have also added the encoding and decoding time to the excel sheet. The encoding 
and decoding times (in the excel sheet) are not very reliable since it is 
calculated for only 1 iteration. I also ran encoding/decoding over a 25M row 
file for 5 iterations and took the average of last 3 iterations. 
HIVE-4123.2.git.patch.txt took 2072ms on average for encoding 25M rows file and 
920ms for decoding the encoded file. On the other hand, HIVE-4123.6.txt took 
1374ms on average for encoding 25M rows file and 874ms for decoding the encoded 
file. 



 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733115#comment-13733115
 ] 

Owen O'Malley commented on HIVE-4123:
-

This is looking good, Prasanth.

A couple more comments:
* You need to handle the date type.
* You should update the checkEncoding to only accept the encodings that are 
appropriate for each type (direct for binary, boolean, struct, and byte; 
direct_v2, dictionary, or dictionary_v2 for string; and direct or direct_v2 for 
most of the rest)
* You should probably make a factory for creating the intreader so that you 
only have the code in one place.
* The formatting on some of the new classes seems to use 8 spaces for 
indentation.


 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-06 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731431#comment-13731431
 ] 

Owen O'Malley commented on HIVE-4123:
-

* Please remove the FIXME comment
* Use the encoding for the column that is passed into startStripe.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731580#comment-13731580
 ] 

Prasanth J commented on HIVE-4123:
--

Following fixes were added to this patch
 - Removed FIXMEs
 - For determining the type of integer encoding (DIRECT/DIRECT_V2) used by 
dictionaries, a new encoding type DICTIONARY_V2 is added. DICTIONARY_V2 uses 
DIRECT_V2 encoding for dictionary data and length streams. In earlier patch, 
there is no way to determined if dictionaries used DIRECT or DIRECT_V2 
encoding. This patch addresses this issue. I am not sure if there is any other 
way to determine this without adding new encoding type. 
 - addressed code review comment related to having if/then/else in flush() 
method of RunLengthIntegerWriterV2


 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729816#comment-13729816
 ] 

Hive QA commented on HIVE-4123:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594072/HIVE-4123.4.patch.txt

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/308/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/308/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-308/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ant/src/org/apache/hadoop/hive/ant/antlib.xml'
Reverted 'hbase-handler/ivy.xml'
Reverted 
'hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java'
Reverted 
'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java'
Reverted 
'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java'
Reverted 
'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java'
Reverted 
'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java'
Reverted 'build.xml'
Reverted 'ivy/libraries.properties'
Reverted 'hcatalog/core/build.xml'
Reverted 'hcatalog/pom.xml'
Reverted 'hcatalog/build.properties'
Reverted 'hcatalog/build.xml'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/snapshot/TestRevisionManager.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/snapshot/TestRevisionManagerEndpoint.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/ManyMiniCluster.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/TestHBaseDirectOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/TestHBaseBulkOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/TestHBaseInputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/TableSnapshot.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerProtocol.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/Transaction.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManager.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerEndpointClient.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerEndpoint.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/ZKBasedRevisionManager.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/ImportSequenceFile.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HbaseSnapshotRecordReader.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseHCatStorageHandler.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseDirectOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBulkOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseInputFormat.java'
Reverted 'hcatalog/storage-handlers/hbase/pom.xml'
Reverted 'hcatalog/build-support/ant/build-common.xml'
Reverted 'hcatalog/build-support/ant/deploy.xml'
Reverted 'hcatalog/build-support/ant/checkstyle.xml'
Reverted 
'hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hcatalog/pig/TestE2EScenarios.java'
Reverted 'build-common.xml'
Reverted '.gitignore'
Reverted 'ql/ivy.xml'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf build ant/src/org/apache/hadoop/hive/ant/SetSystemProperty.java 
hbase-handler/src/java/org/apache/hadoop/hive/hbase/PutWritable.java 

[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-07-24 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718825#comment-13718825
 ] 

Prasanth J commented on HIVE-4123:
--

{quote}Comments:
merge Utils into SerializationUtils.
use the zigzag encode/decode in the the SerializationUtils.read/writeVslong
move Utils.nextLong to the test code
Utils.getTotalBytesRequired should just use long math. (n * numBits + 7) / 8 
should work
Rename IntegerCompressionReader/Writer to RunLengthIntegerReader/WriterV2 
{quote}

Done.

{quote}
Create an interface IntegerReader that has:
seek
next
skip
{quote}

Added hasNext() to interface as well.

{quote}
Make RunLengthIntegerReader and RunLengthIntegerReaderV2 implement IntegerReader
The TreeReaders should declare the fields as IntegerReader.
Each of the startStripe should use the encoding to create the right 
implementation of IntegerReader.
We should do the same with an IntegerWriter interface.
Replace fixedBitSizes with static methods in SerializationUtils:
static int encodeBitWidth(int n)
static int decodeBitWidth(int n)
{quote}

Done.

{quote}
Finding the percentiles seems expensive, we should look at an alternative
{quote}

Done.

{quote}
Why is the delta blob zigzag encoded? The sign should always be positive or 
negative for the entire run.
{quote}

Made the delta base field mandatory, blob is now directly bit packed.

{quote}
Maybe we could create an enum in the Writer that is the version to write that 
would look like enum OrcVersion { V0_11, V0_12 }
and the StreamFactory could provide the version to the TreeWriters.
{quote}

Not done (as per your last comment about passing factory object)

{quote}
I don't see why bitpack reader/writer are more than static methods that 
read/write to the underlying stream. So I would have expected a method like 
writeInts(long[] data, int offset, int length, int numBits, OutputStream 
stream) and the corresponding one for reading.
{quote}

Added as a separate static method. Can we reuse BitFieldReader/BitFieldWriter 
which essentially does the same thing (except it deals with ints)?

{quote}
Utils.bytesToLongBE should take an input stream rather than a byte[].
{quote}

Done.

{quote}
In IntegerCompressionReader:
I'd write a method to translate the int into an opcode rather than use ordinal.
It is probably worth remembering that you are in a repeat, so that you don't 
need to copy the value N times in short repeat.
{quote}

Done.

{quote}
It may be easier to loop through the base values and then run through the 
patches. You might even do three loops: unpack the main values, unpack the 
patches, add the base to each value.
{quote}

My initial implementation was running through 3 loops. But later I refactored 
it to do in a single loop. I think this current patch removed some complexity 
(removed zigzag and changed bitpacking).

{quote}
For patched based only the base is zigzag encoded. The rest of the values are 
always positive.
For delta only the base and base delta are zigzag encoded.
{quote}

Good catch. Updated the patch.

{quote}
In IntegerCompressionWriter:
You should give more comments about the patched base encoding.
Instead of sorting for the percentiles, you could keep a count of how many 
values use each number of bits.
{quote}

Done. Nice idea!

{quote}
Replace the commented out printlns with LOG.debug surrounded by 
LOG.ifDebugEnabled
flush should use if/then/else to prevent writing the data twice
the constructor should probably call clear rather than risk having the default 
values be different
in write, just copy the data with system.arraycopy instead of cloning the array
{quote}

Done. 

{quote}
write should track whether the values are monotonically increasing or 
decreasing so that we know if delta applies
there is a lot of duplication of effort in determine encoding
{quote}

write primarily deals with cutting the runs (determining the scope). There was 
some redundancy that I removed in the current patch. Also tracking min/max was 
wrong with the earlier which is fixed in the new patch. Earlier as and when a 
value is buffered min/max are updated. But this lead to wrong output in some 
cases. For example: 2 3 4 5 6 1 1 1 sequence has min value of 1, but this 1 is 
part of short repeat sequence. This same min value was used for initial delta 
run as well.

min/max/monotonicity/delta computation/percentile are determined while 
iterating through the buffered values.

{quote}
if the sequence is both increasing and decreasing, it is constant and we should 
either use short literal or delta depending on the length
delta encoding should return before doing the percentile work
{quote}

Currently, delta encoding returns before percentile computation. Short repeats 
are determined when buffering values. All other encodings are determined in 
determineEncoding(). 

{quote}
How much unit test coverage do you have of the new code?
{quote}

I have unit 

[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-07-24 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718830#comment-13718830
 ] 

Prasanth J commented on HIVE-4123:
--

Just noticed. Please ignore the formatting changes that slipped through in 
SerializationUtils.java. I will fix that in next version of patch. 

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth J
 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-07-24 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719083#comment-13719083
 ] 

Prasanth J commented on HIVE-4123:
--

Updated the patch with bug fix in patched base encoding. Formatting changes 
fixed in this patch. Added more test cases for patched base encoding that 
covers more edge cases. 

Also changes to TestFileDump has been removed, since the memory memory chooses 
stripe size based on available jvm memory which I vary for different test 
cases. 

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth J
 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-07-23 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717596#comment-13717596
 ] 

Owen O'Malley commented on HIVE-4123:
-

{quote}
1) In the current implementation, I kept the delta base field as optional (used 
only for fixed delta runs) and zigzag encoded the delta blob so that we don't 
have to deal with sign of the deltas.
I can change delta base field to mandatory field to store the base (absolute 
min) value of delta values and zigzag encode it. With base value and delta base 
value, we should be able to identify if the sequence is monotonically 
increasing or decreasing and also we can identify the sign of the delta values. 
I hope this is what you are looking for. Please correct me if my understanding 
is wrong.
{quote}

I think it will be worthwhile always having the delta base and keeping the 
additional delta as an unsigned remainder.

{quote}
2) is there any way we can reuse the Orc's MAJOR and MINOR version as supported 
in HIVE-4724 to figure out if we need use new integer encoding or old integer 
encoding?
{quote}
Yeah, I need to add more framework for that code. I'm leaning toward passing in 
a factory object that creates the right integer encoder.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth J
 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-07-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714795#comment-13714795
 ] 

Owen O'Malley commented on HIVE-4123:
-

More comments:
* I don't see why bitpack reader/writer are more than static methods that 
read/write to the underlying stream. So I would have expected a method like 
writeInts(long[] data, int offset, int length, int numBits, OutputStream 
stream) and the corresponding one for reading.
* Utils.bytesToLongBE should take an input stream rather than a byte[].
* In IntegerCompressionReader:
** I'd write a method to translate the int into an opcode rather than use 
ordinal.
** It is probably worth remembering that you are in a repeat, so that you don't 
need to copy the value N times in short repeat.
** It may be easier to loop through the base values and then run through the 
patches. You might even do three loops: unpack the main values, unpack the 
patches, add the base to each value.
** For patched based only the base is zigzag encoded. The rest of the values 
are always positive.
** For delta only the base and base delta are zigzag encoded. 
* In IntegerCompressionWriter:
** You should give more comments about the patched base encoding.
** Instead of sorting for the percentiles, you could keep a count of how many 
values use each number of bits.
** Replace the commented out printlns with LOG.debug surrounded by 
LOG.ifDebugEnabled
** flush should use if/then/else to prevent writing the data twice
** the constructor should probably call clear rather than risk having the 
default values be different
** in write, just copy the data with system.arraycopy instead of cloning the 
array
** write should track whether the values are monotonically increasing or 
decreasing so that we know if delta applies
** there is a lot of duplication of effort in determine encoding
** if the sequence is both increasing and decreasing, it is constant and we 
should either use short literal or delta depending on the length
** delta encoding should return before doing the percentile work
** 
* How much unit test coverage do you have of the new code?
* Have you run the encoder/decoder round trip over the github data to test it?



 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth J
 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-07-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714282#comment-13714282
 ] 

Prasanth J commented on HIVE-4123:
--

Thanks Owen for the review comments. There are few things I want to make sure 
before submitting the next version of patch.

1) In the current implementation, I kept the delta base field as optional (used 
only for fixed delta runs) and zigzag encoded the delta blob so that we don't 
have to deal with sign of the deltas. 

I can change delta base field to mandatory field to store the base (absolute 
min) value of delta values and zigzag encode it. With base value and delta base 
value, we should be able to identify if the sequence is monotonically 
increasing or decreasing and also we can identify the sign of the delta values. 
I hope this is what you are looking for. Please correct me if my understanding 
is wrong. 

2) is there any way we can reuse the Orc's MAJOR and MINOR version as supported 
in HIVE-4724 to figure out if we need use new integer encoding or old integer 
encoding?


 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth J
 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-07-18 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712916#comment-13712916
 ] 

Owen O'Malley commented on HIVE-4123:
-

Comments:
* merge Utils into SerializationUtils.
* use the zigzag encode/decode in the the SerializationUtils.read/writeVslong
* move Utils.nextLong to the test code
* Utils.getTotalBytesRequired should just use long math. (n * numBits + 7) / 8 
should work
* Rename IntegerCompressionReader/Writer to RunLengthIntegerReader/WriterV2
* Create an interface IntegerReader that has:
** seek
** next
** skip
* Make RunLengthIntegerReader and RunLengthIntegerReaderV2 implement 
IntegerReader
* The TreeReaders should declare the fields as IntegerReader.
* Each of the startStripe should use the encoding to create the right 
implementation of IntegerReader.
* We should do the same with an IntegerWriter interface.
* Replace fixedBitSizes with static methods in SerializationUtils:
** static int encodeBitWidth(int n)
** static int decodeBitWidth(int n)
* Finding the percentiles seems expensive, we should look at an alternative
* Why is the delta blob zigzag encoded? The sign should always be positive or 
negative for the entire run.
* Maybe we could create an enum in the Writer that is the version to write that 
would look like enum OrcVersion { V0_11, V0_12 } and the StreamFactory could 
provide the version to the TreeWriters.



 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth J
 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-07-16 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710342#comment-13710342
 ] 

Prasanth J commented on HIVE-4123:
--

This patch improves upon the existing run length encoding for integers. As 
mentioned in the description, it uses bit packing for more tighter compression, 
improved run length and delta encoding and also it supports longer runs. 

This patch supports the following light weight compression techniques

*SHORT_REPEAT*
*DIRECT*
*PATCHED_BASE*
*DELTA*


The description and format for these types are as below:

*SHORT_REPEAT:* Used for short repeated integer sequences.
* 1 byte header
** 2 bits for encoding type
** 3 bits for bytes required for repeating value
** 3 bits for repeat count (MIN_REPEAT + run length)
* Blob - repeat value (fixed bytes)

*DIRECT:* Used for random integer sequences whose number of bit requirement 
doesn't vary a lot.
* 2 bytes header
** 1st byte
*** 2 bits for encoding type
*** 5 bits for fixed bit width of values in blob
*** 1 bit for storing MSB of run length
** 2nd byte
*** 8 bits for lower run length bits
* Blob - fixed width * run length bits long

*PATCHED_BASE:* Used for random integer sequences whose number of bit 
requirement varies beyond a threshold.
* 4 bytes header
** 1st byte
*** 2 bits for encoding type
*** 5 bits for fixed bit width of values in blob
*** 1 bit for storing MSB of run length
** 2nd byte
*** 8 bits for lower run length bits
** 3rd byte
*** 3 bits for bytes required for base value
*** 5 bits for patch width
** 4th byte
*** 3 bits for patch gap width
*** 5 bits for patch length
* Base value - base width * 8 bits
* Data blob - fixed width * run length
* Patch blob - (patch width + patch gap width) * patch length

*DELTA:* Used for monotonically increasing or decreasing sequences, sequences 
with fixed delta values or long repeated sequences.
* 2 bytes header
** 1st byte
*** 2 bits for encoding type
*** 5 bits for fixed bit width of values in blob
*** 1 bit for storing MSB of run length
** 2nd byte
*** 8 bits for lower run length bits
* Base value - encoded as varint
* Delta base (only long fixed delta runs) - zigzag encoded
* Delta blob (variable delta runs) - zigzag encoded

I have tested this new implementation with the current implementation and the 
comparison of compression ratio between the existing implementation and new 
implementation is shown in the attached excel sheet for various real world 
datasets. As seen from the comparison sheet the new implementation gives 
significant improvement in compression ratio over the existing implementation 
for most of the cases. 

NOTE: This patch is generated against the trunk after applying HIVE-4724 patch. 

[~owen.omalley] can you please review this patch and let me know your review 
comments? Also let me know if I need to upload this patch to phabricator.



 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-07-16 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710448#comment-13710448
 ] 

Prasanth J commented on HIVE-4123:
--

The earlier patch included .orig file generated while patching HIVE-4724. 
Removed .orig file in this new patch. 

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth J
 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira