Fucun Chu has uploaded this change for review. (
http://gerrit.cloudera.org:8080/17088
Change subject: IMPALA-10520: Implement ds_theta_intersect() function
......................................................................
IMPALA-10520: Implement ds_theta_intersect() function
This function receives a set of serialized Apache DataSketches Theta
sketches produced by ds_theta_sketch() and intersects them into a
single sketch.
An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and intersect them to get an
estimates based on the partitions the user is interested in related
sketches. E.g.:
SELECT
ds_theta_estimate(ds_theta_intersect(sketch_col))
FROM sketch_tbl
WHERE partition_col=1 OR partition_col=5;
Testing:
- Apart from the automated tests I added to this patch I also
tested ds_theta_intersect() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_theta_intersect() on those sketches
Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 161 insertions(+), 0 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17088/2
--
To view, visit http://gerrit.cloudera.org:8080/17088
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I80e68c2151c4604f0386d3dfb004c82b10293f97
Gerrit-Change-Number: 17088
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>