Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11582 )

Change subject: IMPALA-6658: improve Parquet RLE for low bit widths
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11582/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11582/2//COMMIT_MSG@20
PS2, Line 20: By default RleEncoder will now use run length encoding for runs of
            : length 24 for single bit values, and of length 16 for 2 bit wide 
values.
            : All other bit widths will use the existing length 8 runs.
About the solution in general:
If I do not misunderstand something then it will lead to worse encoding if 
<24/16 repeated runs alternate with larger repeated runs.

bit width 1:
repeated lengths:               24 16 24 16 24
bytes needed with old solution: 2  2  2  2  2
bytes needed with new solution: 2  3  2  3  2

bit width 2:
repeated lengths:               16 8  16 8  16
bytes needed with old solution: 2  2  2  2  2
bytes needed with new solution: 2  3  2  3  2

This could be avoided by using smaller min_repeated_run_length if the last run 
was a repeated run: 16 for 1 bit width, 8 for 2 bit width (and all other bit 
widths).


http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h@211
PS2, Line 211: mu;tiple
typo: multiple


http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h@475
PS2, Line 475: // updates num_buffered_values
This comment does not seam true here.


http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h@549
PS2, Line 549: we
typo



--
To view, visit http://gerrit.cloudera.org:8080/11582
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I191a581d3f699b6669e48ac9dc39c76ed77c4a76
Gerrit-Change-Number: 11582
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu>
Gerrit-Comment-Date: Wed, 17 Oct 2018 22:24:17 +0000
Gerrit-HasComments: Yes

Reply via email to