[GitHub] orc issue #273: ORC-343 Enable C++ writer to support RleV2
Github user majetideepak commented on the issue: https://github.com/apache/orc/pull/273 The PR looks overall good to me apart from a minor change requested. This is an important patch to align the C++ and Java implementations. Thanks again for working on this! ---
[GitHub] orc issue #275: ORC-371: [C++] Disable Libhdfspp build when Cyrus SASL is no...
Github user majetideepak commented on the issue: https://github.com/apache/orc/pull/275 Since there is some investigation needed here, I am going to merge this patch. We can enable `NO_SASL` build in a later patch. Right now, this is causing a build failure by default. ---
[GitHub] orc pull request #273: ORC-343 Enable C++ writer to support RleV2
Github user majetideepak commented on a diff in the pull request: https://github.com/apache/orc/pull/273#discussion_r192593861 --- Diff: c++/src/RLEv2.hh --- @@ -25,13 +25,89 @@ #include +#define MIN_REPEAT 3 +#define HIST_LEN 32 namespace orc { -class RleDecoderV2 : public RleDecoder { +struct FixedBitSizes { +enum FBS { +ONE = 0, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, ELEVEN, TWELVE, +THIRTEEN, FOURTEEN, FIFTEEN, SIXTEEN, SEVENTEEN, EIGHTEEN, NINETEEN, +TWENTY, TWENTYONE, TWENTYTWO, TWENTYTHREE, TWENTYFOUR, TWENTYSIX, +TWENTYEIGHT, THIRTY, THIRTYTWO, FORTY, FORTYEIGHT, FIFTYSIX, SIXTYFOUR, SIZE +}; +}; + +enum EncodingType { SHORT_REPEAT=0, DIRECT=1, PATCHED_BASE=2, DELTA=3 }; + +struct EncodingOption { + EncodingType encoding; + int64_t fixedDelta; + int64_t gapVsPatchListCount; + int64_t zigzagLiteralsCount; + int64_t baseRedLiteralsCount; + int64_t adjDeltasCount; + uint32_t zzBits90p; + uint32_t zzBits100p; + uint32_t brBits95p; + uint32_t brBits100p; + uint32_t bitsDeltaMax; + uint32_t patchWidth; + uint32_t patchGapWidth; + uint32_t patchLength; + int64_t min; + bool isFixedDelta; +}; + +class RleEncoderV2 : public RleEncoder { public: +RleEncoderV2(std::unique_ptr outStream, bool hasSigned, bool alignBitPacking = true); --- End diff -- `alignedBitPacking` is always true. Should we add a WriterOption to enable/disable it? Java uses the Encoding Strategy to choose this. C++ currently does not have this. ``` java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java:144 if (writer.getEncodingStrategy().equals(OrcFile.EncodingStrategy.SPEED)) { alignedBitpacking = true; } ``` ---
[GitHub] orc pull request #275: ORC-371: [C++] Disable Libhdfspp build when Cyrus SAS...
Github user asfgit closed the pull request at: https://github.com/apache/orc/pull/275 ---
[GitHub] orc pull request #277: ORC-372: Enable valgrind for C++ travis-ci tests
GitHub user majetideepak opened a pull request: https://github.com/apache/orc/pull/277 ORC-372: Enable valgrind for C++ travis-ci tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/majetideepak/orc ORC-372 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/orc/pull/277.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #277 commit 621f6467ace049c90b3044e67c274f9d276b3a0d Author: Deepak Majeti Date: 2018-06-03T16:52:39Z ORC-372: Enable valgrind for C++ travis-ci tests ---
[GitHub] orc pull request #278: ORC-251: Extend InStream and OutStream to support enc...
GitHub user omalley opened a pull request: https://github.com/apache/orc/pull/278 ORC-251: Extend InStream and OutStream to support encryption. This patch: * Adds a method to Codec to get the CompressionKind. * Creates StreamOptions for both InStream and OutStream to gather together the parameters they need. * Extends InStream and OutStream to handle encryption. * Changes InStream to use DiskRangeList instead of List. * Creates CryptoUtils with a method to create an IV based on the stream name. You can merge this pull request into a Git repository by running: $ git pull https://github.com/omalley/orc orc-251 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/orc/pull/278.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #278 commit ff620c3faccbabc7e011e245dfb3bcdbfef41b7a Author: Owen O'Malley Date: 2018-05-09T16:36:28Z ORC-251: Extend InStream and OutStream to support encryption. ---
[GitHub] orc issue #277: ORC-372: Enable valgrind for C++ travis-ci tests
Github user majetideepak commented on the issue: https://github.com/apache/orc/pull/277 There are indeed valgrind failures. I will push a followup patch to fix these. ---
[jira] [Created] (ORC-374) Possible to reduce size of release tarballs?
Wes McKinney created ORC-374: Summary: Possible to reduce size of release tarballs? Key: ORC-374 URL: https://issues.apache.org/jira/browse/ORC-374 Project: ORC Issue Type: Improvement Reporter: Wes McKinney We are building the Apache ORC C++ library as a dependency of Apache Arrow. I have noticed that the latest release tarball for ORC is about 13 MB. It looks like is caused by a combination of * Data files used for testing * Generated Javadoc Here's the {{du}} output {code} $ du -d 2 -h . 14M ./examples/expected 23M ./examples 12K ./proto 48K ./cmake_modules 40K ./site/develop 12K ./site/security 18M ./site/api 24K ./site/_layouts 16K ./site/_data 16K ./site/js 468K./site/img 8.0K./site/help 116K./site/specification 16K ./site/news 8.0K./site/talks 520K./site/fonts 24K ./site/_sass 88K ./site/_includes 120K./site/_posts 108K./site/_docs 32K ./site/css 20M ./site 8.0K./docker/centos7 8.0K./docker/centos6 8.0K./docker/ubuntu16-clang5 8.0K./docker/ubuntu12 8.0K./docker/debian8 8.0K./docker/debian7 8.0K./docker/ubuntu14 8.0K./docker/ubuntu16 76K ./docker 256K./tools/test 56K ./tools/src 320K./tools 8.0K./.git/info 28K ./.git/refs 52K ./.git/hooks 32K ./.git/logs 4.0K./.git/branches 22M ./.git/objects 22M ./.git 64K ./java/examples 260K./java/mapreduce 2.3M./java/core 472K./java/tools 128K./java/shims 356K./java/bench 3.6M./java 708K./c++/test 104K./c++/include 664K./c++/src 948K./c++/libs 2.5M./c++ 71M . {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] orc pull request #273: ORC-343 Enable C++ writer to support RleV2
Github user yuruiz commented on a diff in the pull request: https://github.com/apache/orc/pull/273#discussion_r192614780 --- Diff: c++/src/RLEv2.hh --- @@ -25,13 +25,89 @@ #include +#define MIN_REPEAT 3 +#define HIST_LEN 32 namespace orc { -class RleDecoderV2 : public RleDecoder { +struct FixedBitSizes { +enum FBS { +ONE = 0, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, ELEVEN, TWELVE, +THIRTEEN, FOURTEEN, FIFTEEN, SIXTEEN, SEVENTEEN, EIGHTEEN, NINETEEN, +TWENTY, TWENTYONE, TWENTYTWO, TWENTYTHREE, TWENTYFOUR, TWENTYSIX, +TWENTYEIGHT, THIRTY, THIRTYTWO, FORTY, FORTYEIGHT, FIFTYSIX, SIXTYFOUR, SIZE +}; +}; + +enum EncodingType { SHORT_REPEAT=0, DIRECT=1, PATCHED_BASE=2, DELTA=3 }; + +struct EncodingOption { + EncodingType encoding; + int64_t fixedDelta; + int64_t gapVsPatchListCount; + int64_t zigzagLiteralsCount; + int64_t baseRedLiteralsCount; + int64_t adjDeltasCount; + uint32_t zzBits90p; + uint32_t zzBits100p; + uint32_t brBits95p; + uint32_t brBits100p; + uint32_t bitsDeltaMax; + uint32_t patchWidth; + uint32_t patchGapWidth; + uint32_t patchLength; + int64_t min; + bool isFixedDelta; +}; + +class RleEncoderV2 : public RleEncoder { public: +RleEncoderV2(std::unique_ptr outStream, bool hasSigned, bool alignBitPacking = true); --- End diff -- Done ---