Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9570 )
Change subject: IMPALA-6621: Improve set lookup performance for in-predicate evaluation ...................................................................... IMPALA-6621: Improve set lookup performance for in-predicate evaluation Currently when using a SET_LOOKUP strategy for in-predicates in impala we use a std:set object for checking membership. This patch takes a hybrid approach based on benchmarking results and uses boost::flat_set for int, big int, and float datatypes and boost::unordered_set for the rest (tiny int, small int, double, string, timestamp, decimal). The intent of this change is to fix a regression when upgrading the toolchain to use LLVM 5.0.1 (IMPALA-5980). Performance: Ran a query for each data type with a large in predicate containing 500 elements on a single node with mt_dop set to 1. +-----------+---------------+----------+---------------+----------+ | Data Type | Llvm 3 hybrid | Llvm 3 | Llvm 5 hybrid | Llvm 5 | +-----------+---------------+----------+---------------+----------+ | Table used: tpch100_parquet.lineitem | +-----------+---------------+----------+--------------+-----------+ | big int | 17s782ms | 13s941ms | 13s201ms | 25s604ms | | string | 40s750ms | 64s | 40s723ms | 73s | | decimal | 13s929ms | 22s272ms | 13s710ms | 34s338ms | | int | 19s368ms | 11s308ms | 9s169ms | 15s254ms | +-----------+---------------+----------+--------------+-----------+ | Table used: alltypes with 33638400 rows | +-----------+---------------+----------+--------------+-----------+ | double | 5s726ms | 5s894ms | 5s595ms | 6s592ms | | small int | 4s776ms | 5s057ms | 4s740ms | 5s358ms | | float | 7s223ms | 6s397ms | 6s287ms | 6s926ms | +-----------+---------------+----------+---------------+----------+ Also added a targeted perf query that uses a large in-predicate over a decimal column. Testing: - Ran expr-test and test_exprs successfully. Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df Reviewed-on: http://gerrit.cloudera.org:8080/9570 Reviewed-by: Bikramjeet Vig <bikramjeet....@cloudera.com> Reviewed-by: Dan Hecht <dhe...@cloudera.com> Tested-by: Impala Public Jenkins --- M be/src/benchmarks/in-predicate-benchmark.cc M be/src/exprs/in-predicate.h M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M testdata/workloads/targeted-perf/queries/primitive_filter_in_predicate.test 4 files changed, 700 insertions(+), 192 deletions(-) Approvals: Bikramjeet Vig: Looks good to me, but someone else must approve Dan Hecht: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df Gerrit-Change-Number: 9570 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>