Qifan Chen has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: [WIP] IMPALA-2658: Extend the NDV function to accept a precision
......................................................................

[WIP] IMPALA-2658: Extend the NDV function to accept a precision

This work addresses the current limitation in NDV function by
extending the function to take the 2nd integer-typed argument,
which must be an abstract value in the range of 1 to 10. This
abstract value specifies a real precision value used in the HLL
algorithm for the function.

Front end work:
1. Add a new template ndv function in builtin db that takes two
   arguments.
2. Verify that the 2nd argument of a NDV() is an integer literal in
   [1,10];
3. A new method to implement the mapping of the abstract value to the
   hll precision (the real work is TBD);
4. The length of the intermediate data type is computed based on the
   actual hll precision. When the 2nd argument is missing, the length
   is 1024 as in the current implementation;
5. The 2nd argument, if present, will be carried over all the way to
   the BE.

Back end work:
1. Remove the hardcoded precision (10) from these functions:
     AggregateFunctions::HllInit(),
     AggregateFunctions::HllUpdate(),
     AggregateFunctions::HllMerge(),
     AggregateFunctions::HllFinalEstimate(),
     AggregateFunctions::HllFinalize(),
     HllEstimateBias();
2. Instead, the actual precision is computed from the
   length of the intermediate data type as log2(hll_len);
3. Verify that the length of the intermediate data type is
   correct according to the 2nd argument (if present).

Testing:
1 Add a regression test (test_ndv)) in TestAggregationQueries
  section to computes ndv() for every supported Impala data type.
2 Run unit tests against other tables such as tpcds.store_sales
  and tpch.customer in both serial and parallel plan settings.

select ndv(c_name, 1) "one", ndv(c_name, 2) two, ndv(c_name, 3) three,
ndv(c_name, 4) as four, ndv(c_name, 5) as five, ndv(c_name, 6) as six,
ndv(c_name, 7) as seven, ndv(c_name, 8) as eight, ndv(c_name, 9) as nine,
ndv(c_name, 10)  as ten
from tpch.customer;

select ndv(ss_sold_time_sk, 1) "one", ndv(ss_sold_time_sk, 2) two,
ndv(ss_sold_time_sk, 3) three, ndv(ss_sold_time_sk, 4) as four,
ndv(ss_sold_time_sk, 5) as five, ndv(ss_sold_time_sk, 6) as six,
ndv(ss_sold_time_sk, 7) as seven, ndv(ss_sold_time_sk, 8) as eight,
ndv(ss_sold_time_sk, 9) as nine, ndv(ss_sold_time_sk, 10) as ten
from tpcds.store_sales;

Perf: TBD

Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
---
M be/src/common/logging.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M tests/query_test/test_aggregation.py
6 files changed, 302 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/15997/13
--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>

Reply via email to