Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16095

to look at the new patch set (#6).

Change subject: IMPALA-9633: Implement ds_hll_union()
......................................................................

IMPALA-9633: Implement ds_hll_union()

This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
      ds_hll_estimate(ds_hll_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Note, currently there is a known limitation of unioning string types
where some input sketches come from Impala and some from Hive. In
this case if there is an overlap in the input data used by Impala and
by Hive this overlapping data is still counted twice due to some
string representation difference between Impala and Hive.
For more details see:
https://issues.apache.org/jira/browse/IMPALA-9939

Testing:
  - Apart from the automated tests I added to this patch I also
    tested ds_hll_union() on a bigger dataset to check that
    serialization, deserialization and merging steps work well. I
    took TPCH25.linelitem, created a number of sketches with grouping
    by l_shipdate and called ds_hll_union() on those sketches.

Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/exprs/datasketches-common.cc
A be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
M tests/query_test/test_datasketches.py
10 files changed, 271 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16095/6
--
To view, visit http://gerrit.cloudera.org:8080/16095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Gerrit-Change-Number: 16095
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to