Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9761 )
Change subject: IMPALA-6324: Support reading RLE-encoded boolean values in Parquet scanner ...................................................................... IMPALA-6324: Support reading RLE-encoded boolean values in Parquet scanner Impala already supported RLE encoding for levels and dictionary pages, so the only task was to integrate it into BoolColumnReader. A new benchmark, rle-benchmark.cc is added to test the speed of RLE decoding for different bit widths and run lengths. There might be a small performance impact on PLAIN encoded booleans, because of the additional branch when the cache of BoolColumnReader is filled. As the cache size is 128, I considered this to be outside the "hot loop". Testing: As Impala cannot write RLE encoded bool columns at the moment, parquet-mr was used to create a test file, testdata/data/rle_encoded_bool.parquet tests/query_test/test_scanners.py#test_rle_encoded_bools creates a table that uses this file, and tries to query from it. Reviewed-on: http://gerrit.cloudera.org:8080/9403 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins (cherry picked from commit 588e1d46e9bd88d17676c47a0bf1237d3ebb11da) Conflicts: - be/src/exec/parquet-column-readers.cc - Required fixup due to interaction with IMPALA-6077 (which doesn't exist on 2.x) Change-Id: I683c1f97ec99774bc9bd6ecec6f3f7abcdc8f615 Reviewed-on: http://gerrit.cloudera.org:8080/9761 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/rle-benchmark.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/util/rle-encoding.h M be/src/util/rle-test.cc M testdata/data/README A testdata/data/rle_encoded_bool.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test M tests/query_test/test_scanners.py 10 files changed, 331 insertions(+), 53 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9761 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: merged Gerrit-Change-Id: I683c1f97ec99774bc9bd6ecec6f3f7abcdc8f615 Gerrit-Change-Number: 9761 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong <[email protected]>
