Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9761 )

Change subject: IMPALA-6324: Support reading RLE-encoded boolean values in 
Parquet scanner
......................................................................

IMPALA-6324: Support reading RLE-encoded boolean values in Parquet scanner

Impala already supported RLE encoding for levels and dictionary pages, so
the only task was to integrate it into BoolColumnReader.

A new benchmark, rle-benchmark.cc is added to test the speed of RLE
decoding for different bit widths and run lengths.

There might be a small performance impact on PLAIN encoded booleans,
because of the additional branch when the cache of BoolColumnReader is
filled. As the cache size is 128, I considered this to be outside the
"hot loop".

Testing:

As Impala cannot write RLE encoded bool columns at the moment, parquet-mr
was used to create a test file, testdata/data/rle_encoded_bool.parquet

tests/query_test/test_scanners.py#test_rle_encoded_bools creates a table
that uses this file, and tries to query from it.

Reviewed-on: http://gerrit.cloudera.org:8080/9403
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins
(cherry picked from commit 588e1d46e9bd88d17676c47a0bf1237d3ebb11da)

Conflicts:
 - be/src/exec/parquet-column-readers.cc - Required fixup due to
   interaction with IMPALA-6077 (which doesn't exist on 2.x)

Change-Id: I683c1f97ec99774bc9bd6ecec6f3f7abcdc8f615
Reviewed-on: http://gerrit.cloudera.org:8080/9761
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/rle-benchmark.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
M testdata/data/README
A testdata/data/rle_encoded_bool.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test
M tests/query_test/test_scanners.py
10 files changed, 331 insertions(+), 53 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9761
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I683c1f97ec99774bc9bd6ecec6f3f7abcdc8f615
Gerrit-Change-Number: 9761
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <[email protected]>

Reply via email to