Hello Tim Armstrong, Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9570
to look at the new patch set (#3).
Change subject: IMPALA-6621: Improve set lookup performance for in-predicate
evaluation
......................................................................
IMPALA-6621: Improve set lookup performance for in-predicate evaluation
Currently when using a SET_LOOKUP strategy for in-predicates in impala
we use a std:set object for checking membership. This patch takes a
hybrid approach based on benchmarking results and uses boost::flat_set
for int, big int, and float datatypes and boost::unordered_set for the
rest (tiny int, small int, double, string, timestamp, decimal).
The intent of this change is to fix a regression when upgrading the
toolchain to use LLVM 5.0.1 (IMPALA-5980).
Performance:
Ran a query for each data type with a large in predicate containing
500 elements on a single node with mt_dop set to 1.
+-----------+---------------+----------+---------------+----------+
| Data Type | Llvm 3 hybrid | Llvm 3 | Llvm 5 hybrid | Llvm 5 |
+-----------+---------------+----------+---------------+----------+
| Table used: tpch100_parquet.lineitem |
+-----------+---------------+----------+--------------+-----------+
| big int | 17s782ms | 13s941ms | 13s201ms | 25s604ms |
| string | 40s750ms | 64s | 40s723ms | 73s |
| decimal | 13s929ms | 22s272ms | 13s710ms | 34s338ms |
| int | 19s368ms | 11s308ms | 9s169ms | 15s254ms |
+-----------+---------------+----------+--------------+-----------+
| Table used: alltypes with 33638400 rows |
+-----------+---------------+----------+--------------+-----------+
| double | 5s726ms | 5s894ms | 5s595ms | 6s592ms |
| small int | 4s776ms | 5s057ms | 4s740ms | 5s358ms |
| float | 7s223ms | 6s397ms | 6s287ms | 6s926ms |
+-----------+---------------+----------+---------------+----------+
Also added a targeted perf query that uses a large in-predicate
over a decimal column.
Testing:
- Ran expr-test and test_exprs successfully.
Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df
---
M be/src/benchmarks/in-predicate-benchmark.cc
M be/src/exprs/in-predicate.h
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M testdata/workloads/targeted-perf/queries/primitive_filter_in_predicate.test
4 files changed, 700 insertions(+), 192 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/9570/3
--
To view, visit http://gerrit.cloudera.org:8080/9570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df
Gerrit-Change-Number: 9570
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>