[GitHub] orc issue #273: ORC-343 Enable C++ writer to support RleV2

2018-06-03 Thread majetideepak
Github user majetideepak commented on the issue:

https://github.com/apache/orc/pull/273
  
The PR looks overall good to me apart from a minor change requested. This 
is an important patch to align the C++ and Java implementations. Thanks again 
for working on this!


---


[GitHub] orc issue #275: ORC-371: [C++] Disable Libhdfspp build when Cyrus SASL is no...

2018-06-03 Thread majetideepak
Github user majetideepak commented on the issue:

https://github.com/apache/orc/pull/275
  
Since there is some investigation needed here, I am going to merge this 
patch. We can enable `NO_SASL` build in a later patch. Right now, this is 
causing a build failure by default.


---


[GitHub] orc pull request #273: ORC-343 Enable C++ writer to support RleV2

2018-06-03 Thread majetideepak
Github user majetideepak commented on a diff in the pull request:

https://github.com/apache/orc/pull/273#discussion_r192593861
  
--- Diff: c++/src/RLEv2.hh ---
@@ -25,13 +25,89 @@
 
 #include 
 
+#define MIN_REPEAT 3
+#define HIST_LEN 32
 namespace orc {
 
-class RleDecoderV2 : public RleDecoder {
+struct FixedBitSizes {
+enum FBS {
+ONE = 0, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, 
ELEVEN, TWELVE,
+THIRTEEN, FOURTEEN, FIFTEEN, SIXTEEN, SEVENTEEN, EIGHTEEN, 
NINETEEN,
+TWENTY, TWENTYONE, TWENTYTWO, TWENTYTHREE, TWENTYFOUR, TWENTYSIX,
+TWENTYEIGHT, THIRTY, THIRTYTWO, FORTY, FORTYEIGHT, FIFTYSIX, 
SIXTYFOUR, SIZE
+};
+};
+
+enum EncodingType { SHORT_REPEAT=0, DIRECT=1, PATCHED_BASE=2, DELTA=3 };
+
+struct EncodingOption {
+  EncodingType encoding;
+  int64_t fixedDelta;
+  int64_t gapVsPatchListCount;
+  int64_t zigzagLiteralsCount;
+  int64_t baseRedLiteralsCount;
+  int64_t adjDeltasCount;
+  uint32_t zzBits90p;
+  uint32_t zzBits100p;
+  uint32_t brBits95p;
+  uint32_t brBits100p;
+  uint32_t bitsDeltaMax;
+  uint32_t patchWidth;
+  uint32_t patchGapWidth;
+  uint32_t patchLength;
+  int64_t min;
+  bool isFixedDelta;
+};
+
+class RleEncoderV2 : public RleEncoder {
 public:
+RleEncoderV2(std::unique_ptr outStream, bool 
hasSigned, bool alignBitPacking = true);
--- End diff --

`alignedBitPacking` is always true. Should we add a WriterOption to 
enable/disable it?
Java uses the Encoding Strategy to choose this. C++ currently does not have 
this.
```
java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java:144
if (writer.getEncodingStrategy().equals(OrcFile.EncodingStrategy.SPEED)) {
 alignedBitpacking = true;
}
```


---


[GitHub] orc pull request #275: ORC-371: [C++] Disable Libhdfspp build when Cyrus SAS...

2018-06-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/orc/pull/275


---


[GitHub] orc pull request #277: ORC-372: Enable valgrind for C++ travis-ci tests

2018-06-03 Thread majetideepak
GitHub user majetideepak opened a pull request:

https://github.com/apache/orc/pull/277

ORC-372: Enable valgrind for C++ travis-ci tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/majetideepak/orc ORC-372

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #277


commit 621f6467ace049c90b3044e67c274f9d276b3a0d
Author: Deepak Majeti 
Date:   2018-06-03T16:52:39Z

ORC-372: Enable valgrind for C++ travis-ci tests




---


[GitHub] orc pull request #278: ORC-251: Extend InStream and OutStream to support enc...

2018-06-03 Thread omalley
GitHub user omalley opened a pull request:

https://github.com/apache/orc/pull/278

ORC-251: Extend InStream and OutStream to support encryption.

This patch:
* Adds a method to Codec to get the CompressionKind.
* Creates StreamOptions for both InStream and OutStream to gather together 
the parameters they need.
* Extends InStream and OutStream to handle encryption.
* Changes InStream to use DiskRangeList instead of List.
* Creates CryptoUtils with a method to create an IV based on the stream 
name.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/orc orc-251

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/orc/pull/278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #278


commit ff620c3faccbabc7e011e245dfb3bcdbfef41b7a
Author: Owen O'Malley 
Date:   2018-05-09T16:36:28Z

ORC-251: Extend InStream and OutStream to support encryption.




---


[GitHub] orc issue #277: ORC-372: Enable valgrind for C++ travis-ci tests

2018-06-03 Thread majetideepak
Github user majetideepak commented on the issue:

https://github.com/apache/orc/pull/277
  
There are indeed valgrind failures. I will push a followup patch to fix 
these.


---


[jira] [Created] (ORC-374) Possible to reduce size of release tarballs?

2018-06-03 Thread Wes McKinney (JIRA)
Wes McKinney created ORC-374:


 Summary: Possible to reduce size of release tarballs?
 Key: ORC-374
 URL: https://issues.apache.org/jira/browse/ORC-374
 Project: ORC
  Issue Type: Improvement
Reporter: Wes McKinney


We are building the Apache ORC C++ library as a dependency of Apache Arrow. I 
have noticed that the latest release tarball for ORC is about 13 MB. 

It looks like is caused by a combination of 

* Data files used for testing
* Generated Javadoc

Here's the {{du}} output

{code}
$ du -d 2 -h .
14M ./examples/expected
23M ./examples
12K ./proto
48K ./cmake_modules
40K ./site/develop
12K ./site/security
18M ./site/api
24K ./site/_layouts
16K ./site/_data
16K ./site/js
468K./site/img
8.0K./site/help
116K./site/specification
16K ./site/news
8.0K./site/talks
520K./site/fonts
24K ./site/_sass
88K ./site/_includes
120K./site/_posts
108K./site/_docs
32K ./site/css
20M ./site
8.0K./docker/centos7
8.0K./docker/centos6
8.0K./docker/ubuntu16-clang5
8.0K./docker/ubuntu12
8.0K./docker/debian8
8.0K./docker/debian7
8.0K./docker/ubuntu14
8.0K./docker/ubuntu16
76K ./docker
256K./tools/test
56K ./tools/src
320K./tools
8.0K./.git/info
28K ./.git/refs
52K ./.git/hooks
32K ./.git/logs
4.0K./.git/branches
22M ./.git/objects
22M ./.git
64K ./java/examples
260K./java/mapreduce
2.3M./java/core
472K./java/tools
128K./java/shims
356K./java/bench
3.6M./java
708K./c++/test
104K./c++/include
664K./c++/src
948K./c++/libs
2.5M./c++
71M .
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] orc pull request #273: ORC-343 Enable C++ writer to support RleV2

2018-06-03 Thread yuruiz
Github user yuruiz commented on a diff in the pull request:

https://github.com/apache/orc/pull/273#discussion_r192614780
  
--- Diff: c++/src/RLEv2.hh ---
@@ -25,13 +25,89 @@
 
 #include 
 
+#define MIN_REPEAT 3
+#define HIST_LEN 32
 namespace orc {
 
-class RleDecoderV2 : public RleDecoder {
+struct FixedBitSizes {
+enum FBS {
+ONE = 0, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, 
ELEVEN, TWELVE,
+THIRTEEN, FOURTEEN, FIFTEEN, SIXTEEN, SEVENTEEN, EIGHTEEN, 
NINETEEN,
+TWENTY, TWENTYONE, TWENTYTWO, TWENTYTHREE, TWENTYFOUR, TWENTYSIX,
+TWENTYEIGHT, THIRTY, THIRTYTWO, FORTY, FORTYEIGHT, FIFTYSIX, 
SIXTYFOUR, SIZE
+};
+};
+
+enum EncodingType { SHORT_REPEAT=0, DIRECT=1, PATCHED_BASE=2, DELTA=3 };
+
+struct EncodingOption {
+  EncodingType encoding;
+  int64_t fixedDelta;
+  int64_t gapVsPatchListCount;
+  int64_t zigzagLiteralsCount;
+  int64_t baseRedLiteralsCount;
+  int64_t adjDeltasCount;
+  uint32_t zzBits90p;
+  uint32_t zzBits100p;
+  uint32_t brBits95p;
+  uint32_t brBits100p;
+  uint32_t bitsDeltaMax;
+  uint32_t patchWidth;
+  uint32_t patchGapWidth;
+  uint32_t patchLength;
+  int64_t min;
+  bool isFixedDelta;
+};
+
+class RleEncoderV2 : public RleEncoder {
 public:
+RleEncoderV2(std::unique_ptr outStream, bool 
hasSigned, bool alignBitPacking = true);
--- End diff --

Done


---