[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Internal Jenkins has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 15: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. The tests are run with both codegen enabled and disabled. To avoid flaky tests, we switch the UDF tests to use "unique_database". Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Reviewed-on: http://gerrit.cloudera.org:8080/4655 Reviewed-by: Tim ArmstrongTested-by: Internal Jenkins --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/java-udf.test M testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/uda.test A testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/udf.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 47 files changed, 1,039 insertions(+), 802 deletions(-) Approvals: Internal Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Hello Internal Jenkins, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4655 to look at the new patch set (#15). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. The tests are run with both codegen enabled and disabled. To avoid flaky tests, we switch the UDF tests to use "unique_database". Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/java-udf.test M testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/uda.test A testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/udf.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 47 files changed, 1,039 insertions(+), 802 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/15 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 15: Code-Review+2 Bad rebase - needed to fix a test -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 14: Code-Review+2 Rebase, carry +2 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#13). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. The tests are run with both codegen enabled and disabled. To avoid flaky tests, we switch the UDF tests to use "unique_database". Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/java-udf.test M testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/uda.test A testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/udf.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 47 files changed, 1,003 insertions(+), 762 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/13 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 12: (11 comments) http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: Line 103: DCHECK(false) << "NYI:" << type.DebugString(); > why do we not need these still with the removal of the bail outs for CHAR? The char bailout was still there, but more implicit. Line 462: void CodegenAnyVal::SetFromRawPtr(Value* raw_ptr) { > this looks unused. remove? Done http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: Line 53: /// TYPE_TIMESTAMP/TimestampVal: { i64, i64 } > decimal? Done http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1071: // eliminate a branch per value. > what code is this paragraph referring to? should it precede the Init() call It probably makes sense to move it up, since it's implemented by both Init() and the below code. Line 1079: && evaluator->intermediate_type().type != TYPE_TIMESTAMP) { > IsTimestampType() for consistency now that most types have that? Done PS12, Line 1454: : > maybe explain that we hand-generate a code sequence in this case. Done PS12, Line 1670: .type != TYPE_TIMESTAMP > IsTimestampType()? Done PS12, Line 1742: We must use the unlowered type > would be good to briefly explain why rather than just the what. Done Line 1811: "intermediate tuple desc"); > when would this happen? If there's a char field in the tuple. I think this is a step back in terms of error messages, so I added back an explicit check for CHAR fields in the intermediate tuple. I actually noticed there are a couple of places in the codebase that don't check for a null result from this function, so I fixed those. http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS12, Line 630: ; > remove Done http://gerrit.cloudera.org:8080/#/c/4655/12/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: PS12, Line 143: the > "a boolean value represented ..." ? like above comment. Done -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 12: Rebased. Had to make some fixes to the UDF tests because of logical conflicts with one of Michael's codegen changes - I started running the tests with and without codegen, but some of the tests can now not be run with disable_codegen=1. -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#12). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. The tests are run with both codegen enabled and disabled. To avoid flaky tests, we switch the UDF tests to use "unique_database". Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/java-udf.test M testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/uda.test A testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/udf.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 47 files changed, 973 insertions(+), 747 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/12 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 9: (8 comments) Rebased to pick up IMPALA-3884 http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: PS7, Line 459: > that Done http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: Line 184: /// lowered type. This *Val should be non-null. The output variable is called 'name'. > nit: space Done http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS7, Line 347: : /// TODO: Return Status inst > Avoid cloning if possible to reduce compilation time. Done http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS7, Line 1660: ond, the : // value of the slot was initialized in the right way in InitAggSlots() (e.g. 0 for : // SUM) that we get the right result if UpdateSlot() pretends that the NULL bit of : // 'dst' is unse > Please also see comments above. I tried to clean this up a bit. My intention was to document the details of the initialisation in InitAggSlot() and provide a pointer here. PS7, Line 1675: dst.SetIsNull(slot_desc->CodegenIsNull(codegen, , agg_tuple_arg)); > Why is this not in the old code or was it done somewhere else ? It wasn't in the old code - the old code didn't handle arbitrary UDAs, just a subset of the builtins. E.g. the old code would have given wrong results if we changed SUM() so that it returned NULL on overflow. http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exec/partitioned-aggregation-node.h File be/src/exec/partitioned-aggregation-node.h: PS7, Line 649: 'dst_val' should contain the previous value of the aggregate : /// function, and 'updated_dst_val' is set to the new value after the Update or Merge : /// operation is applied. > Just curious why we separate 'dst_val' and 'result_val'. Doesn't 'result_va We need two different CodegenAnyVal, since it really represents the value (which changes) rather than the memory location. We could use an in-out argument instead of separate input and output arguments but this seemed simpler to me. I renamed 'result_val' to 'updated_dst_val' to hopefully make the connection clearer (the naming was confusing). http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: PS7, Line 496: false > Please feel free to ignore but you can consider setting this second argumen It seems clearer to make it explicit, given it's not obvious what the "default" behaviour should be. http://gerrit.cloudera.org:8080/#/c/4655/7/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: Line 141: # Verify codegen was enabled for all stages of the aggregation. > Can remove this restriction now that IMPALA-3884 is in master Done -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#9). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 39 files changed, 842 insertions(+), 562 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/9 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#8). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 39 files changed, 846 insertions(+), 564 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/8 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/4655/7/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: Line 141: # (IMPALA-3884). Can remove this restriction now that IMPALA-3884 is in master -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Michael Ho has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 7: (8 comments) http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: PS7, Line 459: hat that http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: Line 184: /// lowered type. This *Val should be non-null.The output variable is called 'name'. nit: space http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS7, Line 347: but we don't clone if we can avoid it to : /// reduce compilation time. Avoid cloning if possible to reduce compilation time. http://gerrit.cloudera.org:8080/#/c/4655/6/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS6, Line 1063: if we initialize the destination value to 0 (with the NULL bit set) > We could do that but it doesn't seem worth it (the interpreted path has eno Sure. I agree that the interpreted path probably won't benefit much. My concern has more to do with the explanation of the optimization and the assumption made here: 1. Who is initializing the slot to the proper value ? 2. How are these initialized values different for different aggregate functions ? 3. Will the NULL bits be changed after initialization ? If so, by whom ? 4. And why is it okay to not check the NULL bits after calling the UDA function for certain aggregate functions ? Is it because they cannot change or is it already handled somewhere else ? http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS7, Line 1660: Second, : // the NULL bit of the dst slot can be ignored if the slot is initialised in the : // right way (see InitAggSlots()). Empirically this optimisation makes TPC-H Q1 : // 5-10% faster. Please also see comments above. It's not very clear what it means by "the NULL bit of the dst slot" can be ignored ? Is it more precise to say their state cannot change after initialization or they will be updated appropriately by the update function ? It seems different aggregate function (e.g. min vs avg) has different default values with different conditions for the NULL bit to be set. May help to document a bit more details. PS7, Line 1675: dst.SetIsNull(slot_desc->CodegenIsNull(codegen, , agg_tuple_arg)); Why is this not in the old code or was it done somewhere else ? http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exec/partitioned-aggregation-node.h File be/src/exec/partitioned-aggregation-node.h: PS7, Line 649: 'dst_val' should contain the previous value of the aggregate : /// function, and 'result_val' is set to the new value after the Update or Merge : /// operation is applied. Just curious why we separate 'dst_val' and 'result_val'. Doesn't 'result_val' subsume 'dst_val' ? http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: PS7, Line 496: false Please feel free to ignore but you can consider setting this second argument to default value of false. However, keeping it as is may be clearer (and you have gone through the trouble of updating all call sites - thanks !). -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 39 files changed, 845 insertions(+), 561 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/7 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 39 files changed, 845 insertions(+), 561 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/7 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/4655/6/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS6, Line 1063: if we initialize the destination value to 0 (with the NULL bit set) > If I understand it correctly, this initialization is done at the initFn() o We could do that but it doesn't seem worth it (the interpreted path has enough additional overhead that I doubt it makes a measurable difference). PS6, Line 1617: Value* dst_ptr > May help to explain what dst_ptr actually is ? It seems to be pointer to th Done PS6, Line 1631: (agg_op == AggFnEvaluator::MIN || agg_op == AggFnEvaluator::MAX)) > nit: I find it easier to read to check the agg_op first: Done PS6, Line 1639: slot_desc->CodegenSetNullIndicator > Can this be skipped if (!slot_desc->is_nullable()) ? Same for below. Min/Max should always be nullable (if there are no input values, the output is NULL). Added a DCHECK to assert this. PS6, Line 1663: agg_op == AggFnEvaluator::MIN > agg_op != OTHER ? or you wanna keep this list for clarity ? Yeah I prefer this way, also since it won't break if someone adds a new aggregate op PS6, Line 1718: false); > Why not just pass true and skip calling CloneFunction() below ? I somehow missed that we called Clonefunction() below. PS6, Line 1742: Value* dst_lowered_ptr = dst.GetLoweredPtr("dst_lowered_ptr"); : Type* dst_unlowered_ptr_type = CodegenAnyVal::GetUnloweredPtrType(codegen, dst_type); : Value* dst_unlowered_ptr = builder->CreateBitCast( : dst_lowered_ptr, dst_unlowered_ptr_type, "dst_unlowered_ptr"); > Why not dst.GetUnloweredPtr() ? We use both dst_unlowered_ptr and dst_lowered_ptr so we need both variables. We can't just call both GetLoweredPtr() and GetUnloweredPtr() here because dst_lowered_ptr and dst_unlowered_ptr need to point to the same alloca(). -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Michael Ho has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 6: (7 comments) Looking good. Some more comments for now. Still doing another pass. http://gerrit.cloudera.org:8080/#/c/4655/6/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS6, Line 1063: if we initialize the destination value to 0 (with the NULL bit set) If I understand it correctly, this initialization is done at the initFn() of various aggregate operators (e.g. initNull() and initZero()) so this is even true for the interpreted path, right ? Not sure why we cannot do that even for the interpreted path too ? PS6, Line 1617: Value* dst_ptr May help to explain what dst_ptr actually is ? It seems to be pointer to the slot for this agg_fn_evaluator in the intermediate tuple PS6, Line 1631: (agg_op == AggFnEvaluator::MIN || agg_op == AggFnEvaluator::MAX)) nit: I find it easier to read to check the agg_op first: if ((agg_op == ..) && dst_is_numeric_or_bool)) { ... } PS6, Line 1639: slot_desc->CodegenSetNullIndicator Can this be skipped if (!slot_desc->is_nullable()) ? Same for below. PS6, Line 1663: agg_op == AggFnEvaluator::MIN agg_op != OTHER ? or you wanna keep this list for clarity ? PS6, Line 1718: false); Why not just pass true and skip calling CloneFunction() below ? PS6, Line 1742: Value* dst_lowered_ptr = dst.GetLoweredPtr("dst_lowered_ptr"); : Type* dst_unlowered_ptr_type = CodegenAnyVal::GetUnloweredPtrType(codegen, dst_type); : Value* dst_unlowered_ptr = builder->CreateBitCast( : dst_lowered_ptr, dst_unlowered_ptr_type, "dst_unlowered_ptr"); Why not dst.GetUnloweredPtr() ? -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 6: Rebased onto the codegen interface changes. -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/types.h M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 39 files changed, 842 insertions(+), 559 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/6 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 4: (13 comments) http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: PS4, Line 242: if (val.is_null) goto null_block; > Can you please elaborate this comment with code snippet like the following Done http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS4, Line 1065: can ignore the NULL bit of its destination value is : // ignored > ? Done PS4, Line 1067: set > did you mean with NULL bit cleared ? No it's set - the value should start off as NULL, since sum() of an empty set returns NULL. Line 1599: RETURN_IF_ERROR(agg_expr->GetCodegendComputeFn(state_, _expr_fn)); > DCHECK(agg_expr_fn != NULL); Done PS4, Line 1620: src.CodegenBranchIfNull(, ret_block); > It seems that we emit this check for all paths except for the UDA path. Wh This is deliberate and a correctness fix that is needed to match the interpreted path. In general the UDA may do some processing on NULL values so we need to pass them in. E.g. see the test UDA I added that counts the # of NULLs in anticipation of using this codegen for arbitrary UDAs. Filtering out src NULLs before calling the UDA is a special case optimisation that we can do for aggregate functions where we know the semantics.. PS4, Line 1628: dst_type.type != TYPE_TIMESTAMP : && !dst_type.IsStringType() > This seems to be reused in multiple places. May be it's worth factoring thi I factored this out and made it a positive check so that most of the conditions are one line now and it's clearer what cases are actually handled. PS4, Line 1637: dst_type.type != TYPE_TIMESTAMP : && !dst_type.IsStringType() > It may be easier to read if we do the check once up front: I think I prefer this way because the implementation is directly adjacent to the conditions for when it should be. E.g. with the alternative factoring, If I'm looking at case agg_op == SUM, I have to look in two places and decode a more complex expression to figure out what type/agg_op combinations that code is handling. Also since the UDA interface is the "generic" approach and the others are special-case optimisations, so it makes more sense to me to check for the special cases then false back to the generic one. I did some cleanup and added a comment up the top to make it clearer how it should be read. PS4, Line 1676: src.CodegenBranchIfNull(, ret_block); > Actually, referring to the comment above again, it seems that we should jus I find it easier to follow this way since the logic for each case is grouped together. E.g. all the logic for UpdateSlot() on a UDA fits on a screen, rather than having some of NULL handling down here and some 100 lines up. PS4, Line 1685: resulting in > with results stored in Done PS4, Line 1725: codegen->GetFunction(symbol); > Should we consider adding "bool clone" as the second argument similar to th Done. PS4, Line 1745: Value* input_lowered_ptr = : codegen->CreateEntryBlockAlloca(builder->GetInsertBlock()->getParent(), : LlvmCodeGen::NamedVariable("input_lowered_ptr", : input_vals[i].value()->getType())); : builder->CreateStore(input_vals[i].value(), input_lowered_ptr); : Type* input_unlowered_ptr_type = CodegenAnyVal::GetUnloweredPtrType( : codegen, evaluator->input_expr_ctxs()[i]->root()->type()); : Value* input_unlowered_ptr = builder->CreateBitCast( : input_lowered_ptr, input_unlowered_ptr_type, "input_unlowered_ptr"); > Do you think it's worth factoring this logic out to a function given it's r There was actually already a helper in CodegenAnyVal that did most of the work. Improved/cleaned that up a bit then used it. http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/exprs/compound-predicates.cc File be/src/exprs/compound-predicates.cc: PS4, Line 68: > Please consider removing these extra blank spaces too. Done http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: PS4, Line 603: Value* tuple_bytes = builder->CreateBitCast(tuple, codegen->ptr_type()); : Constant* null_byte_offset = : ConstantInt::get(codegen->int_type(), null_indicator_offset_.byte_offset); : Value* null_byte_ptr = : builder->CreateInBoundsGEP(tuple_bytes, null_byte_offset, "null_byte_ptr"); > What do you think about factoring this code snippet as a utility function f Done. Also now sharing the code with SlotRef.
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Michael Ho has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 4: (13 comments) http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: PS4, Line 242: if (val.is_null) goto null_block; Can you please elaborate this comment with code snippet like the following ? if (val.is_null) goto null_block; non_null_block: //default entry point after calling this function null_block: http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS4, Line 1065: can ignore the NULL bit of its destination value is : // ignored ? PS4, Line 1067: set did you mean with NULL bit cleared ? Line 1599: RETURN_IF_ERROR(agg_expr->GetCodegendComputeFn(state_, _expr_fn)); DCHECK(agg_expr_fn != NULL); PS4, Line 1620: src.CodegenBranchIfNull(, ret_block); It seems that we emit this check for all paths except for the UDA path. Why is it safe to make the assumption that 'src' is not NULL for the UDA path ? Also, it appears that we may consider doing some refactoring by computing 'bool use_uda_interface' up front and only emit this cmp-and-branch if that's false. Please see comments below about 'use_uda_interface'. PS4, Line 1628: dst_type.type != TYPE_TIMESTAMP : && !dst_type.IsStringType() This seems to be reused in multiple places. May be it's worth factoring this expression out. PS4, Line 1637: dst_type.type != TYPE_TIMESTAMP : && !dst_type.IsStringType() It may be easier to read if we do the check once up front: agg_op = evaluator->agg_op(); bool use_uda_interface = agg_op != COUNT && (dst_type.type == TYPE_TIMESTAMP || dst_type.IsStringType || (dst_type.type == TYPE_DECIMAL && agg_op() == AggFnEvaluator::SUM); if (use_uda_interface) { } else { switch (agg_op) { } } PS4, Line 1676: src.CodegenBranchIfNull(, ret_block); Actually, referring to the comment above again, it seems that we should just do the check before the if-stmt sequence and emit the cmp-and-branch instructions up front for these cases. IMHO, I find the new code harder to follow because the basic blocks in the emitted code are not as explicitly laid out as the old code. PS4, Line 1685: resulting in with results stored in PS4, Line 1725: codegen->GetFunction(symbol); Should we consider adding "bool clone" as the second argument similar to the other variant of GetFunction() ? PS4, Line 1745: Value* input_lowered_ptr = : codegen->CreateEntryBlockAlloca(builder->GetInsertBlock()->getParent(), : LlvmCodeGen::NamedVariable("input_lowered_ptr", : input_vals[i].value()->getType())); : builder->CreateStore(input_vals[i].value(), input_lowered_ptr); : Type* input_unlowered_ptr_type = CodegenAnyVal::GetUnloweredPtrType( : codegen, evaluator->input_expr_ctxs()[i]->root()->type()); : Value* input_unlowered_ptr = builder->CreateBitCast( : input_lowered_ptr, input_unlowered_ptr_type, "input_unlowered_ptr"); Do you think it's worth factoring this logic out to a function given it's repeated again from line 1757-1764 ? http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/exprs/compound-predicates.cc File be/src/exprs/compound-predicates.cc: PS4, Line 68: Please consider removing these extra blank spaces too. http://gerrit.cloudera.org:8080/#/c/4655/4/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: PS4, Line 603: Value* tuple_bytes = builder->CreateBitCast(tuple, codegen->ptr_type()); : Constant* null_byte_offset = : ConstantInt::get(codegen->int_type(), null_indicator_offset_.byte_offset); : Value* null_byte_ptr = : builder->CreateInBoundsGEP(tuple_bytes, null_byte_offset, "null_byte_ptr"); What do you think about factoring this code snippet as a utility function for generating code to return null_byte_ptr ? This seems to be useful for CodegenIsNull() too. -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. Patch Set 4: Rebase onto Alex's change. -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 37 files changed, 744 insertions(+), 484 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/4 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 38 files changed, 748 insertions(+), 486 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/3 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions .. IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions This change enables codegen for all builtin aggregate functions, e.g. timestamp functions and group_concat. There are several parts to the change: * Adding support for generic UDAs. Previous the codegen code did not handle multiple input arguments or NULL return values. * Defaulting to using the UDA interface when there is not a special codegen path (we have implementations of all builtin aggregate functions for the interpreted path). * Remove all the logic to disable codegen for the special cases that now are supported. Also fix the generation of code to get/set NULL bits since I needed to add functionality there anyway. Testing: Add tests that check that codegen was enabled for builtin aggregate functions. Also fix some gaps in the preexisting tests. Also add tests for UDAs that check input/output nulls are handled correctly, in anticipation of enabling codegen for arbitrary UDAs. Perf: Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1, since my original approach regressed it ~5%. In the end the problem was to do with the ordering of loads/stores to the slot and null bit in the generated code: the previous version of the code exploited some properties of the particular aggregate function. I ended up replicating this behaviour to avoid regressing perf. Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/codegen-anyval.cc M be/src/codegen/codegen-anyval.h M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.h M be/src/exec/aggregation-node.cc M be/src/exec/exec-node.cc M be/src/exec/hash-join-node.cc M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/text-converter.cc M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/case-expr.cc M be/src/exprs/compound-predicates.cc M be/src/exprs/expr.cc M be/src/exprs/literal.cc M be/src/exprs/null-literal.cc M be/src/exprs/scalar-fn-call.cc M be/src/exprs/slot-ref.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/testutil/test-udas.cc M be/src/testutil/test-udfs.cc M be/src/util/tuple-row-compare.cc M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java M testdata/workloads/functional-query/queries/QueryTest/uda.test M tests/common/test_result_verifier.py M tests/query_test/test_aggregation.py M tests/query_test/test_udfs.py 37 files changed, 691 insertions(+), 450 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/2 -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong