Zach Amsden has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
......................................................................


Patch Set 16:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

Line 89:     idx = scratch_batch_.NextValidTuple(idx);
Surprising, despite apparently being costlier, the bitmap scan appears to be 
fast, especially when predicates are selective.


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 420:               LIKELY(dictionary_results_.num_bits() > 0)) {
Not liking this branch, but it is unavoidable without pre-computation overhead. 
 When we have a partially dictionary encoding on a filtered column that spills 
to plain encoding, we don't pre-evaluate the dictionary against predicates.  We 
could do so anyway, but it may end up being an unnecessary up-front cost.  
However, the number of dictionary entries before spilling to plain is bounded, 
so the cost should be fixed.

Maybe we can afford to do so if the predicates are "inexpensive", but then we 
are still left with this check for expensive predicate (and have no way to 
determine cost here without another check).


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 172:     // pointer.  We don't need to transfer the extra filter byte.
Comment obsolete with latest diff.


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 94:   NullIndicatorOffset(int byte_offset = 0, int bit_offset = -1)
This fixes an apparent bug, where byte_offset = -1 (passed in constructor for 
ParquetColumnReader as NullIndicatorOffset(-1, -1) ends up converting into code 
which will execute something like:

tuple_mem[-1] |= 0

Which despite not modifying memory, could be an out of bounds memory access.


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/util/bitmap-test.cc
File be/src/util/bitmap-test.cc:

Line 81:   for (index = bm.NextSetBit(); index < size; index = 
bm.NextSetBit(index)) {
test needs updating for behavior change: index -> index + 1


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zach Amsden <[email protected]>
Gerrit-HasComments: Yes

Reply via email to