[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10683 ) Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width .. IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width Reading dictionary encoded Parquet data pages where the bit width is larger than the encoded type's size (e.g. coding 8 bit TINYINT with 16 bit dictionary indices) led to DCHECK error in debug builds. Impala does not create such parquet files (an N bit type can have maximum 2^N distinct values, so N bit dictionary indices are enough for a dictionary that contains every possible value), but the Parquet standard does not forbid to do so. These DCHECKs were probably introduced by a copy paste error (similar checks exist in the non-dictionary encoded bit reader functions, where they are valid). Testing: - a new test is added to check that these data pages can be decoded correctly Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a Reviewed-on: http://gerrit.cloudera.org:8080/10683 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/util/bit-packing.inline.h M be/src/util/bit-stream-utils.inline.h M testdata/data/README A testdata/data/dict_encoding_with_large_bit_width.parquet M tests/query_test/test_scanners.py 5 files changed, 21 insertions(+), 3 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a Gerrit-Change-Number: 10683 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10683 ) Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a Gerrit-Change-Number: 10683 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 11 Jun 2018 23:19:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10683 ) Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2632/ -- To view, visit http://gerrit.cloudera.org:8080/10683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a Gerrit-Change-Number: 10683 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 11 Jun 2018 20:09:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10683 ) Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a Gerrit-Change-Number: 10683 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 11 Jun 2018 16:12:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10683 Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width .. IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width Reading dictionary encoded Parquet data pages where the bit width is larger than the encoded type's size (e.g. coding 8 bit TINYINT with 16 bit dictionary indices) led to DCHECK error in debug builds. Impala does not create such parquet files (an N bit type can have maximum 2^N distinct values, so N bit dictionary indices are enough for a dictionary that contains every possible value), but the Parquet standard does not forbid to do so. These DCHECKs were probably introduced by a copy paste error (similar checks exist in the non-dictionary encoded bit reader functions, where they are valid). Testing: - a new test is added to check that these data pages can be decoded correctly Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a --- M be/src/util/bit-packing.inline.h M be/src/util/bit-stream-utils.inline.h M testdata/data/README A testdata/data/dict_encoding_with_large_bit_width.parquet M tests/query_test/test_scanners.py 5 files changed, 21 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/10683/1 -- To view, visit http://gerrit.cloudera.org:8080/10683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a Gerrit-Change-Number: 10683 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer