Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9403 )

Change subject: IMPALA-6324: Support reading RLE-encoded boolean values in 
Parquet scanner
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9403/4/be/src/benchmarks/rle-benchmark.cc
File be/src/benchmarks/rle-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/9403/4/be/src/benchmarks/rle-benchmark.cc@39
PS4, Line 39: //           for loop / run length: 10                0.4      
0.4      0.4     0.487X     0.487X     0.486X
            : //             memset / run length: 10              0.396      
0.4      0.4     0.482X     0.487X     0.486X
Some thoughts about this performance degradation compared to the run_length=1 
case:

If bit_width=1, then 8 values encoded as a repeated run can use more space than 
if they were encoded as a literal run. RleEncoder currently always uses 
repeated runs if it finds 8 repeated values - it may be good to change this to 
a higher number (16?) if bit_width=1.

The performance seems to be similar if bit_width>1, so the space inefficiency 
above is probably not the real cause. I suspect that the problem with short 
repeated runs is that BatchedBitReader is optimized for 32*N batches, and 
shorter literal runs are buffered by RleBatchDecoder. It would be possible to 
avoid buffering in case of 8*N batches too, which could improve the performance 
of shorter runs.



--
To view, visit http://gerrit.cloudera.org:8080/9403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4644bf8cf5d2b7238b05076407fbf78ab5d2c14f
Gerrit-Change-Number: 9403
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Mar 2018 15:14:53 +0000
Gerrit-HasComments: Yes

Reply via email to