Tim Armstrong has submitted this change and it was merged.

Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types
......................................................................


IMPALA-3931: arbitrary fixed-size uda intermediate types

Make many builtin aggregate functions use fixed-length intermediate
types:
* avg()
* ndv()
* stddev(), variance(), etc
* distinctpc(), distinctpcsa()

sample(), appx_median(), histogram() and group_concat() actually
allocate var-len data so aren't changed.

This has some major benefits:
* Spill-to-disk works properly with these aggregations.
* Aggregations are more efficient because there is one less pointer
  indirection.
* Aggregations use less memory, because we don't need an extra 12-byte
  StringValue for the indirection.

Adds a special-purpose internal type FIXED_UDA_INTERMEDIATE. The type
is represented in the same way as CHAR - a fixed-size array of bytes,
stored inline in tuples. However, it is not user-visible and does
not support CHAR semantics, i.e. users can't declare tables, functions,
etc with the type. The pointer and length is passed into aggregate functions
wrapped in a StringVal.

Updates some internal codegen functions to work better with the new
type. E.g. store values directly into the result tuple instead of
via an intermediate stack allocation.

Testing:
This change only affects builtin aggregate functions, for which we
have test coverage already. If we were to allow wider use of this type,
it would need further testing.

Added an analyzer test to ensure we can't use the type for UDAs.

Added a regression test for spilling avg().

Added a regression test for UDA with CHAR intermediate hitting DCHECK.

Perf:
Ran TPC-H locally. TPC-H Q17, which has a high-cardinality AVG(),
improved dramatically.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(60) | parquet / none / none | 18.44   | -17.54%    | 11.92      | -5.34% 
        |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TPCH(60) | TPCH-Q12 | parquet / none / none | 18.40  | 17.64       | +4.32%   
  |   0.77%   |   1.09%        | 1           | 5     |
| TPCH(60) | TPCH-Q22 | parquet / none / none | 7.07   | 6.90        | +2.36%   
  |   0.28%   |   0.30%        | 1           | 5     |
| TPCH(60) | TPCH-Q3  | parquet / none / none | 12.37  | 12.11       | +2.10%   
  |   0.18%   |   0.15%        | 1           | 5     |
| TPCH(60) | TPCH-Q7  | parquet / none / none | 42.48  | 42.09       | +0.93%   
  |   2.45%   |   0.80%        | 1           | 5     |
| TPCH(60) | TPCH-Q6  | parquet / none / none | 3.18   | 3.15        | +0.89%   
  |   0.67%   |   0.76%        | 1           | 5     |
| TPCH(60) | TPCH-Q19 | parquet / none / none | 7.24   | 7.20        | +0.50%   
  |   0.95%   |   0.67%        | 1           | 5     |
| TPCH(60) | TPCH-Q10 | parquet / none / none | 13.37  | 13.30       | +0.50%   
  |   0.48%   |   1.39%        | 1           | 5     |
| TPCH(60) | TPCH-Q5  | parquet / none / none | 7.47   | 7.44        | +0.36%   
  |   0.58%   |   0.54%        | 1           | 5     |
| TPCH(60) | TPCH-Q11 | parquet / none / none | 2.03   | 2.02        | +0.06%   
  |   0.26%   |   1.95%        | 1           | 5     |
| TPCH(60) | TPCH-Q4  | parquet / none / none | 5.48   | 5.50        | -0.27%   
  |   0.62%   |   1.12%        | 1           | 5     |
| TPCH(60) | TPCH-Q13 | parquet / none / none | 22.11  | 22.18       | -0.31%   
  |   0.18%   |   0.55%        | 1           | 5     |
| TPCH(60) | TPCH-Q15 | parquet / none / none | 8.45   | 8.48        | -0.32%   
  |   0.40%   |   0.47%        | 1           | 5     |
| TPCH(60) | TPCH-Q9  | parquet / none / none | 33.39  | 33.66       | -0.81%   
  |   0.75%   |   0.59%        | 1           | 5     |
| TPCH(60) | TPCH-Q21 | parquet / none / none | 71.34  | 72.07       | -1.01%   
  |   1.84%   |   1.79%        | 1           | 5     |
| TPCH(60) | TPCH-Q14 | parquet / none / none | 5.93   | 6.00        | -1.07%   
  |   0.15%   |   0.69%        | 1           | 5     |
| TPCH(60) | TPCH-Q20 | parquet / none / none | 5.72   | 5.79        | -1.09%   
  |   0.59%   |   0.51%        | 1           | 5     |
| TPCH(60) | TPCH-Q18 | parquet / none / none | 45.42  | 45.93       | -1.10%   
  |   1.42%   |   0.50%        | 1           | 5     |
| TPCH(60) | TPCH-Q2  | parquet / none / none | 4.81   | 4.89        | -1.52%   
  |   1.68%   |   1.01%        | 1           | 5     |
| TPCH(60) | TPCH-Q16 | parquet / none / none | 5.41   | 5.52        | -1.98%   
  |   0.66%   |   0.73%        | 1           | 5     |
| TPCH(60) | TPCH-Q1  | parquet / none / none | 27.58  | 29.13       | -5.34%   
  |   0.24%   |   1.51%        | 1           | 5     |
| TPCH(60) | TPCH-Q8  | parquet / none / none | 12.61  | 14.30       | -11.78%  
  |   6.20%   | * 15.28% *     | 1           | 5     |
| TPCH(60) | TPCH-Q17 | parquet / none / none | 43.74  | 126.58      | I 
-65.44%  |   1.34%   |   9.60%        | 1           | 5     |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+

Change-Id: Ife90cf27989f98ffb5ef5c39f1e09ce92e8cb87c
Reviewed-on: http://gerrit.cloudera.org:8080/7526
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com>
---
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/hash-table.cc
M be/src/exec/incr-stats-util.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/anyval-util.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/slot-ref.cc
M be/src/exprs/utility-functions-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/testutil/test-udas.cc
M be/src/udf/udf.h
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/PrimitiveType.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
M testdata/workloads/functional-query/queries/QueryTest/functions-ddl.test
A testdata/workloads/functional-query/queries/QueryTest/spilling-aggs.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
M tests/query_test/test_spilling.py
M tests/query_test/test_udfs.py
36 files changed, 654 insertions(+), 454 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ife90cf27989f98ffb5ef5c39f1e09ce92e8cb87c
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to