Alex Behm has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions ......................................................................
Patch Set 2: (13 comments) Need to spend more time with the meat of this patch. http://gerrit.cloudera.org:8080/#/c/4655/2/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1064: // To minimize branching on the UpdateTuple path, initialize the result value The details of how this saves branches was somewhat nebulous to me. Seems worth mentioning that we do this trick here because later in codegen we want to ignore the dst slot's null indicator to get rid of a branch, although the dst slot is nullable. When reading this I was confused about where these UpdateTuple() and Update() functions come from. Do you mean the UDA Update() function? Line 1597: Value* expr_ctx_ptr = builder.CreateGEP( inbounds? Line 1658: // properties. First, for these builtins, null input values can be skipped. Second, NULL http://gerrit.cloudera.org:8080/#/c/4655/2/be/src/exec/partitioned-aggregation-node.h File be/src/exec/partitioned-aggregation-node.h: Line 654: void CodegenBranchIfNull(LlvmCodeGen* codegen, LlvmBuilder* builder, Does it make sense to put this into LlvmCodegen? Seems generally useful. http://gerrit.cloudera.org:8080/#/c/4655/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 630: static inline StringVal ALWAYS_INLINE DefaultStringConcatDelim() { typo: not http://gerrit.cloudera.org:8080/#/c/4655/2/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 580: builder->CreateGEP(tuple_bytes, null_byte_offset, "null_byte_ptr"); inbounds? (and elsewhere) Line 619: result = builder->CreateOr(null_byte, mask, "null_bit_cleared"); null_bit_set Line 624: } else { Why would we not require is_null to be a ConstantInt? Do we exercise the code in the else block anywhere? http://gerrit.cloudera.org:8080/#/c/4655/2/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 139: /// represented as a 1-bit integer indicating whether this slot is null in 'tuple'. 1-byte integer? Line 144: /// NULL bit in the given 'tuple' to the 'value' in null. 'tuple' is a pointer typo: value 'is_null' Line 145: /// to the tuple, and 'null' is an boolean value represented as a 1-bit integer. some typos in this sentence, should be 'is_null' 1-byte integer? http://gerrit.cloudera.org:8080/#/c/4655/2/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: Line 31: agg_functions = ['sum', 'count', 'min', 'max', 'avg'] add ndv if easy enough? alternatively file a JIRA for adding more coverage (and the extended coverage in my comments below) Line 33: data_types = ['int', 'bool', 'double', 'bigint', 'tinyint', maybe you can squeeze string into this patch if it's easy enough we should leave decimal for later -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-HasComments: Yes
