Grant Henke has uploaded this change for review. (
http://gerrit.cloudera.org:8080/13635
Change subject: KUDU-2846: optimize predicate evaluation for primitives
......................................................................
KUDU-2846: optimize predicate evaluation for primitives
This changes to an optimized unrolled-by-8 predicate evaluation for
primitive columns.
Performance is improved by up to 7.2x depending on the particular
predicate, type, and nullability (average around 4.8x). Branches are
reduced by about 6.5x and branch-misses by about 22x.
It's possible that hand-coded SIMD could improve on this a little bit
but likely not worth the effort.
perf-stat before:
Performance counter stats for 'build/latest/bin/column_predicate-test
--gtest_filter=*Bench*':
73905.379627 task-clock (msec) # 0.997 CPUs utilized
272,810,081,028 cycles # 3.691 GHz
938,488,388,743 instructions # 3.44 insn per cycle
148,052,698,322 branches # 2003.274 M/sec
882,311,138 branch-misses # 0.60% of all branches
perf-stat after:
Performance counter stats for 'build/latest/bin/column_predicate-test
--gtest_filter=*Bench*':
15354.077654 task-clock (msec) # 0.992 CPUs utilized
56,850,629,856 cycles # 3.703 GHz
181,599,095,960 instructions # 3.19 insn per cycle
22,496,453,160 branches # 1465.178 M/sec
38,662,626 branch-misses # 0.17% of all branches
Detailed results before:
int8 NOT NULL (c = 0) 632.1M evals/sec 4.44 cycles/eval
int8 NULL (c = 0) 515.6M evals/sec 5.48 cycles/eval
int8 NOT NULL (c >= 0) 630.8M evals/sec 4.45 cycles/eval
int8 NULL (c >= 0) 426.8M evals/sec 6.64 cycles/eval
int8 NOT NULL (c >= 0 AND c < 2) 632.6M evals/sec 4.44 cycles/eval
int8 NULL (c >= 0 AND c < 2) 384.7M evals/sec 7.38 cycles/eval
int16 NOT NULL (c = 0) 644.4M evals/sec 4.34 cycles/eval
int16 NULL (c = 0) 524.6M evals/sec 5.37 cycles/eval
int16 NOT NULL (c >= 0) 638.4M evals/sec 4.37 cycles/eval
int16 NULL (c >= 0) 458.8M evals/sec 6.17 cycles/eval
int16 NOT NULL (c >= 0 AND c < 2) 635.3M evals/sec 4.40 cycles/eval
int16 NULL (c >= 0 AND c < 2) 335.1M evals/sec 8.50 cycles/eval
int32 NOT NULL (c = 0) 645.2M evals/sec 4.34 cycles/eval
int32 NULL (c = 0) 492.6M evals/sec 5.77 cycles/eval
int32 NOT NULL (c >= 0) 608.6M evals/sec 4.64 cycles/eval
int32 NULL (c >= 0) 440.7M evals/sec 6.48 cycles/eval
int32 NOT NULL (c >= 0 AND c < 2) 637.8M evals/sec 4.43 cycles/eval
int32 NULL (c >= 0 AND c < 2) 348.0M evals/sec 8.22 cycles/eval
int64 NOT NULL (c = 0) 642.7M evals/sec 4.36 cycles/eval
int64 NULL (c = 0) 505.3M evals/sec 5.60 cycles/eval
int64 NOT NULL (c >= 0) 643.5M evals/sec 4.34 cycles/eval
int64 NULL (c >= 0) 472.8M evals/sec 6.00 cycles/eval
int64 NOT NULL (c >= 0 AND c < 2) 634.2M evals/sec 4.43 cycles/eval
int64 NULL (c >= 0 AND c < 2) 396.7M evals/sec 7.21 cycles/eval
float NOT NULL (c = 0) 604.6M evals/sec 4.63 cycles/eval
float NULL (c = 0) 406.7M evals/sec 7.05 cycles/eval
float NOT NULL (c >= 0) 545.3M evals/sec 5.20 cycles/eval
float NULL (c >= 0) 384.4M evals/sec 7.39 cycles/eval
float NOT NULL (c >= 0 AND c < 2) 583.2M evals/sec 4.80 cycles/eval
float NULL (c >= 0 AND c < 2) 312.2M evals/sec 9.12 cycles/eval
double NOT NULL (c = 0) 614.0M evals/sec 4.56 cycles/eval
double NULL (c = 0) 471.5M evals/sec 5.99 cycles/eval
double NOT NULL (c >= 0) 623.0M evals/sec 4.48 cycles/eval
double NULL (c >= 0) 379.9M evals/sec 7.47 cycles/eval
double NOT NULL (c >= 0 AND c < 2) 599.5M evals/sec 4.67 cycles/eval
double NULL (c >= 0 AND c < 2) 415.2M evals/sec 6.82 cycles/eval
Detailed results after:
int8 NOT NULL (c = 0) 3660.3M evals/sec 0.76 cycles/eval
int8 NULL (c = 0) 3657.1M evals/sec 0.76 cycles/eval
int8 NOT NULL (c >= 0) 3712.0M evals/sec 0.75 cycles/eval
int8 NULL (c >= 0) 3618.9M evals/sec 0.78 cycles/eval
int8 NOT NULL (c >= 0 AND c < 2) 1661.9M evals/sec 1.73 cycles/eval
int8 NULL (c >= 0 AND c < 2) 1663.4M evals/sec 1.77 cycles/eval
int16 NOT NULL (c = 0) 3781.4M evals/sec 0.73 cycles/eval
int16 NULL (c = 0) 3738.3M evals/sec 0.74 cycles/eval
int16 NOT NULL (c >= 0) 3672.9M evals/sec 0.76 cycles/eval
int16 NULL (c >= 0) 3767.4M evals/sec 0.75 cycles/eval
int16 NOT NULL (c >= 0 AND c < 2) 1654.3M evals/sec 1.77 cycles/eval
int16 NULL (c >= 0 AND c < 2) 1651.6M evals/sec 1.72 cycles/eval
int32 NOT NULL (c = 0) 2925.1M evals/sec 0.97 cycles/eval
int32 NULL (c = 0) 2844.4M evals/sec 0.97 cycles/eval
int32 NOT NULL (c >= 0) 2942.7M evals/sec 0.95 cycles/eval
int32 NULL (c >= 0) 2900.8M evals/sec 0.98 cycles/eval
int32 NOT NULL (c >= 0 AND c < 2) 1641.1M evals/sec 1.73 cycles/eval
int32 NULL (c >= 0 AND c < 2) 1638.8M evals/sec 1.75 cycles/eval
int64 NOT NULL (c = 0) 3878.6M evals/sec 0.71 cycles/eval
int64 NULL (c = 0) 3763.9M evals/sec 0.76 cycles/eval
int64 NOT NULL (c >= 0) 2784.4M evals/sec 1.01 cycles/eval
int64 NULL (c >= 0) 2782.6M evals/sec 1.01 cycles/eval
int64 NOT NULL (c >= 0 AND c < 2) 1671.4M evals/sec 1.71 cycles/eval
int64 NULL (c >= 0 AND c < 2) 1741.5M evals/sec 1.64 cycles/eval
float NOT NULL (c = 0) 3940.8M evals/sec 0.72 cycles/eval
float NULL (c = 0) 3820.9M evals/sec 0.72 cycles/eval
float NOT NULL (c >= 0) 4571.4M evals/sec 0.60 cycles/eval
float NULL (c >= 0) 4741.3M evals/sec 0.58 cycles/eval
float NOT NULL (c >= 0 AND c < 2) 1318.0M evals/sec 2.18 cycles/eval
float NULL (c >= 0 AND c < 2) 1262.3M evals/sec 2.28 cycles/eval
double NOT NULL (c = 0) 2813.4M evals/sec 1.01 cycles/eval
double NULL (c = 0) 2664.6M evals/sec 1.06 cycles/eval
double NOT NULL (c >= 0) 3620.8M evals/sec 0.77 cycles/eval
double NULL (c >= 0) 3657.2M evals/sec 0.76 cycles/eval
double NOT NULL (c >= 0 AND c < 2) 1248.8M evals/sec 2.30 cycles/eval
double NULL (c >= 0 AND c < 2) 1253.7M evals/sec 2.28 cycles/eval
Change-Id: I9dd062961a3cd2c892997d6aba12684e603628a1
Reviewed-on: http://gerrit.cloudera.org:8080/13591
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <[email protected]>
(cherry picked from commit 349aeaab33d33ba1ed323a6a4ff1bd6eee971d85)
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
3 files changed, 147 insertions(+), 13 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/13635/1
--
To view, visit http://gerrit.cloudera.org:8080/13635
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: branch-1.10.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9dd062961a3cd2c892997d6aba12684e603628a1
Gerrit-Change-Number: 13635
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>