Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16235 )
Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions ...................................................................... IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions ds_kll_sketch() is an aggregate function that receives a float parameter (e.g. a float column of a table) and returns a serialized Apache DataSketches KLL sketch of the input data set wrapped into STRING type. This sketch can be saved into a table or view and later used for quantile approximations. ds_kll_quantile() receives two parameters: a STRING parameter that contains a serialized KLL sketch and a DOUBLE that represents the rank of the quantile in the range of [0,1]. E.g. rank=0.1 means the approximate value in the sketch where 10% of the sketched items are less than or equals to this value. Testing: - Added automated tests on small data sets to check the basic functionality of sketching and getting a quantile approximate. - Tested on TPCH25_parquet.lineitem to check that sketching and approximating works on bigger scale as well where serialize/merge phases are also required. On this scale the error range of the quantile approximation is within 1-1.5% Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 Reviewed-on: http://gerrit.cloudera.org:8080/16235 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/kll_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 12 files changed, 333 insertions(+), 22 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 Gerrit-Change-Number: 16235 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>