Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9570 )

Change subject: IMPALA-6621: Improve set lookup performance for in-predicate 
evaluation
......................................................................

IMPALA-6621: Improve set lookup performance for in-predicate evaluation

Currently when using a SET_LOOKUP strategy for in-predicates in impala
we use a std:set object for checking membership. This patch takes a
hybrid approach based on benchmarking results and uses boost::flat_set
for int, big int, and float datatypes and boost::unordered_set for the
rest (tiny int, small int, double, string, timestamp, decimal).

The intent of this change is to fix a regression when upgrading the
toolchain to use LLVM 5.0.1 (IMPALA-5980).

Performance:
Ran a query for each data type with a large in predicate containing
500 elements on a single node with mt_dop set to 1.

+-----------+---------------+----------+---------------+----------+
| Data Type | Llvm 3 hybrid |  Llvm 3  | Llvm 5 hybrid |  Llvm 5  |
+-----------+---------------+----------+---------------+----------+
|           Table used: tpch100_parquet.lineitem                  |
+-----------+---------------+----------+--------------+-----------+
| big int   | 17s782ms      | 13s941ms | 13s201ms      | 25s604ms |
| string    | 40s750ms      | 64s      | 40s723ms      | 73s      |
| decimal   | 13s929ms      | 22s272ms | 13s710ms      | 34s338ms |
| int       | 19s368ms      | 11s308ms | 9s169ms       | 15s254ms |
+-----------+---------------+----------+--------------+-----------+
|           Table used: alltypes with 33638400 rows               |
+-----------+---------------+----------+--------------+-----------+
| double    | 5s726ms       | 5s894ms  | 5s595ms       | 6s592ms  |
| small int | 4s776ms       | 5s057ms  | 4s740ms       | 5s358ms  |
| float     | 7s223ms       | 6s397ms  | 6s287ms       | 6s926ms  |
+-----------+---------------+----------+---------------+----------+

Also added a targeted perf query that uses a large in-predicate
over a decimal column.

Testing:
- Ran expr-test and test_exprs successfully.

Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df
Reviewed-on: http://gerrit.cloudera.org:8080/9570
Reviewed-by: Bikramjeet Vig <bikramjeet....@cloudera.com>
Reviewed-by: Dan Hecht <dhe...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/in-predicate-benchmark.cc
M be/src/exprs/in-predicate.h
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M testdata/workloads/targeted-perf/queries/primitive_filter_in_predicate.test
4 files changed, 700 insertions(+), 192 deletions(-)

Approvals:
  Bikramjeet Vig: Looks good to me, but someone else must approve
  Dan Hecht: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df
Gerrit-Change-Number: 9570
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to