Zach Amsden has posted comments on this change. Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries ......................................................................
Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 420: LIKELY(dictionary_results_.num_bits() > 0)) { > I think the predicate evaluation on 40,000 values is probably cheap enough We can certainly try it. I was worried the pre-computation might be expensive if we have, say, string manipulation in predicates, as opposed to inexpensive, simple comparisons. Still, even if we have the same number of predicate evaluations, they end up going through the unoptimized EvalConjuncts() path, as opposed to the codegen'd path. As for IS_FILTERED, that is set when the column reader is created. IS_DICT_ENCODED is determined per page. We are left with no way to remove dictionary_results_.num_bits() on a per-row basis, since we can't unset IS_FILTERED, and IS_DICT_ENCODED may be true even if the encoding did not cover all values. I'll try precomputing all the values and see what happens. -- To view, visit http://gerrit.cloudera.org:8080/6726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zach Amsden <[email protected]> Gerrit-HasComments: Yes
