[ https://issues.apache.org/jira/browse/IMPALA-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-6039. ----------------------------------- Resolution: Fixed Fix Version/s: Impala 2.11.0 IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding Switch the decoders to using more batch-oriented interfaces. As an intermediate step this doesn't make the interfaces of LevelDecoder or DictDecoder batch-oriented, only the lower-level utility classes. The next step would be to change those interfaces to be batch-oriented and make according optimisations in parquet. This could deliver much larger perf improvements than the current patch. The high-level changes are. * BitReader -> BatchedBitReader, which is built to unpack runs of 32 bit-packed values efficiently. * RleDecoder -> RleBatchDecoder, which exposes the repeated and literal runs to the caller and uses BatchedBitReader to unpack literal runs efficiently. * Dict decoding uses RleBatchDecoder to decode repeated runs efficiently and uses the BitPacking utilities to unpack and encode in a single step. Also removes an older benchmark that isn't too interesting (since the batch-oriented approach to encoding and decoding is so much faster than the value-by-value approach). Testing: * Ran core tests. * Updated unit tests to exercise new code. * Added test coverage for the deprecated bit-packed level encoding to that it still works (there was no coverage previously). Perf: Single-node benchmarks showed a few % performance gain. 16 node cluster benchmarks only showed a gain for TPC-H nested. Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 Reviewed-on: http://gerrit.cloudera.org:8080/8267 Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com> Tested-by: Impala Public Jenkins --- > BitReader::GetAligned() doesn't zero out trailing bytes > ------------------------------------------------------- > > Key: IMPALA-6039 > URL: https://issues.apache.org/jira/browse/IMPALA-6039 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.11.0 > Reporter: Tim Armstrong > Assignee: Tim Armstrong > Priority: Minor > Fix For: Impala 2.11.0 > > > BitReader::GetAligned() only sets the initial bytes of the output value, > leaving the remaining bytes set to whatever they were previously. It isn't > clear if this is intentional and undocumented or a latent bug. > The problem is non-obvious because the current callsites either call it with > a single byte at a time (GetVlqInt()), or initialize the output value to zero > and always call GetAligned() with the same num_bytes value (RleDecoder). -- This message was sent by Atlassian JIRA (v6.4.14#64029)