Alex Behm has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8569
Change subject: IMPALA-5310: Part 2: Add SAMPLED_NDV() function. ...................................................................... IMPALA-5310: Part 2: Add SAMPLED_NDV() function. Adds a new SAMPLED_NDV() aggregate function that is intended to be used in COMPUTE STATS TABLESAMPLE. This patch only adds the function itself. Integration with COMPUTE STATS will come in a separate patch. SAMPLED_NDV() estimates the number of distinct values (NDV) based on a sample of data and the corresponding sampling rate. The main idea is to collect several x/y data points where x is the number of rows and y is the corresponding NDV estimate. These data points are used to fit an objective function to the data such that the true NDV can be extrapolated. The aggregate function maintains a fixed number of HyperLogLog intermediates to compute the x/y points. Several objective functions are fit and the best-fit one is used for extrapolation. Adds the MPFIT C library to perform curve fitting: https://www.physics.wisc.edu/~craigm/idl/cmpfit.html The library is a C port from Fortran. Scipy uses the Fortran version of the library for curve fitting. Testing: - added functional tests - core/hdfs run passed Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h A be/src/thirdparty/mpfit/DISCLAIMER A be/src/thirdparty/mpfit/README A be/src/thirdparty/mpfit/mpfit.c A be/src/thirdparty/mpfit/mpfit.h M be/src/util/CMakeLists.txt A be/src/util/mpfit-util.cc A be/src/util/mpfit-util.h M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M tests/query_test/test_aggregation.py 15 files changed, 3,885 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8569/1 -- To view, visit http://gerrit.cloudera.org:8080/8569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820 Gerrit-Change-Number: 8569 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>