[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width

2018-06-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10683 )

Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on 
dictionary encoding bit width
..

IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding 
bit width

Reading dictionary encoded Parquet data pages where the bit width is
larger than the encoded type's size (e.g. coding 8 bit TINYINT with
16 bit dictionary indices) led to DCHECK error in debug builds.
Impala does not create such parquet files (an N bit type can have
maximum 2^N distinct values, so N bit dictionary indices are enough
for a dictionary that contains every possible value), but the Parquet
standard does not forbid to do so.

These DCHECKs were probably introduced by a copy paste error (similar
checks exist in the non-dictionary encoded bit reader functions,
where they are valid).

Testing:
- a new test is added to check that these data pages can be decoded
  correctly

Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a
Reviewed-on: http://gerrit.cloudera.org:8080/10683
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.inline.h
M testdata/data/README
A testdata/data/dict_encoding_with_large_bit_width.parquet
M tests/query_test/test_scanners.py
5 files changed, 21 insertions(+), 3 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a
Gerrit-Change-Number: 10683
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width

2018-06-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10683 )

Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on 
dictionary encoding bit width
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a
Gerrit-Change-Number: 10683
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 11 Jun 2018 23:19:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width

2018-06-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10683 )

Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on 
dictionary encoding bit width
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2632/


--
To view, visit http://gerrit.cloudera.org:8080/10683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a
Gerrit-Change-Number: 10683
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 11 Jun 2018 20:09:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width

2018-06-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10683 )

Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on 
dictionary encoding bit width
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a
Gerrit-Change-Number: 10683
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 11 Jun 2018 16:12:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding bit width

2018-06-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10683


Change subject: IMPALA-7417: Remove DCHECKs with unnecessary constraint on 
dictionary encoding bit width
..

IMPALA-7417: Remove DCHECKs with unnecessary constraint on dictionary encoding 
bit width

Reading dictionary encoded Parquet data pages where the bit width is
larger than the encoded type's size (e.g. coding 8 bit TINYINT with
16 bit dictionary indices) led to DCHECK error in debug builds.
Impala does not create such parquet files (an N bit type can have
maximum 2^N distinct values, so N bit dictionary indices are enough
for a dictionary that contains every possible value), but the Parquet
standard does not forbid to do so.

These DCHECKs were probably introduced by a copy paste error (similar
checks exist in the non-dictionary encoded bit reader functions,
where they are valid).

Testing:
- a new test is added to check that these data pages can be decoded
  correctly

Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a
---
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.inline.h
M testdata/data/README
A testdata/data/dict_encoding_with_large_bit_width.parquet
M tests/query_test/test_scanners.py
5 files changed, 21 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/10683/1
--
To view, visit http://gerrit.cloudera.org:8080/10683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9ff3b00cbcab09dec11b3607d7d9a9c2c0025e1a
Gerrit-Change-Number: 10683
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer