Gabor Kaszab has uploaded this change for review. (
http://gerrit.cloudera.org:8080/16095
Change subject: IMPALA-9633: Implement ds_hll_union()
......................................................................
IMPALA-9633: Implement ds_hll_union()
This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.
An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
SELECT
ds_hll_estimate(ds_hll_union(sketch_col))
FROM sketch_tbl
WHERE partition_col=1 OR partition_col=5;
Testing:
- Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.
Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-common.cc
A be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
8 files changed, 243 insertions(+), 34 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/1
--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab <[email protected]>