Yuqi Du has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19794


Change subject: [client] Reduce space complexity and speed up hash partition 
pruning for in-list predicate
......................................................................

[client] Reduce space complexity and speed up hash partition pruning for 
in-list predicate

This patch's background is in https://gerrit.cloudera.org/c/19568/. As
that patch said, logic of pruning hash partitions for in-list predicate
in Kudu cpp client has also a high space complexity and slow. Old
algorithm must keep all intermedium objects because they are incomplete
until they are completed and can be compute hash.

This patch fixes the problems and provides a recursive algorithm the
same as java client in patch https://gerrit.cloudera.org/c/19568/.

Unlike java client, this problem in cpp client is not critical.
Old algorithm in cpp client does rare go out of memory because it
swap the intermedium objects. This optimization has good benifit too.
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues can show
some benifits. The benifits depend on the in-list length, for example,
in the test TestMultiColumnInListHashPruningManyValues, set kMaxInListLength
50, old algorithm memory cost can reach 300MB, while new algorithm's memory
cost can be ignored(it only need one object and a few stacks for context).
At the same time, new algorithm has a good speedup, below shows some effect:

combination_count: 5554006920000, old algorithm cost: 428238us, new algorithm 
cost: 713us, speedup: 600.61430575035058x
combination_count: 89083783664568, old algorithm cost: 2764924us, new algorithm 
cost: 1145us, speedup: 2414.780786026201x
combination_count: 27194091724800, old algorithm cost: 1610475us, new algorithm 
cost: 1151us, speedup: 1399.1963509991313x
combination_count: 7116622216704, old algorithm cost: 34544289us, new algorithm 
cost: 375us, speedup: 92118.104x
combination_count: 37570734489600, old algorithm cost: 1733205us, new algorithm 
cost: 901us, speedup: 1923.6459489456161x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 228 insertions(+), 18 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/1
--
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 1
Gerrit-Owner: Yuqi Du <[email protected]>

Reply via email to