Michael Ho has posted comments on this change. Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions ......................................................................
Patch Set 7: (8 comments) http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: PS7, Line 459: hat that http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: Line 184: /// lowered type. This *Val should be non-null.The output variable is called 'name'. nit: space http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS7, Line 347: but we don't clone if we can avoid it to : /// reduce compilation time. Avoid cloning if possible to reduce compilation time. http://gerrit.cloudera.org:8080/#/c/4655/6/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS6, Line 1063: if we initialize the destination value to 0 (with the NULL bit set) > We could do that but it doesn't seem worth it (the interpreted path has eno Sure. I agree that the interpreted path probably won't benefit much. My concern has more to do with the explanation of the optimization and the assumption made here: 1. Who is initializing the slot to the proper value ? 2. How are these initialized values different for different aggregate functions ? 3. Will the NULL bits be changed after initialization ? If so, by whom ? 4. And why is it okay to not check the NULL bits after calling the UDA function for certain aggregate functions ? Is it because they cannot change or is it already handled somewhere else ? http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS7, Line 1660: Second, : // the NULL bit of the dst slot can be ignored if the slot is initialised in the : // right way (see InitAggSlots()). Empirically this optimisation makes TPC-H Q1 : // 5-10% faster. Please also see comments above. It's not very clear what it means by "the NULL bit of the dst slot" can be ignored ? Is it more precise to say their state cannot change after initialization or they will be updated appropriately by the update function ? It seems different aggregate function (e.g. min vs avg) has different default values with different conditions for the NULL bit to be set. May help to document a bit more details. PS7, Line 1675: dst.SetIsNull(slot_desc->CodegenIsNull(codegen, &builder, agg_tuple_arg)); Why is this not in the old code or was it done somewhere else ? http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exec/partitioned-aggregation-node.h File be/src/exec/partitioned-aggregation-node.h: PS7, Line 649: 'dst_val' should contain the previous value of the aggregate : /// function, and 'result_val' is set to the new value after the Update or Merge : /// operation is applied. Just curious why we separate 'dst_val' and 'result_val'. Doesn't 'result_val' subsume 'dst_val' ? http://gerrit.cloudera.org:8080/#/c/4655/7/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: PS7, Line 496: false Please feel free to ignore but you can consider setting this second argument to default value of false. However, keeping it as is may be clearer (and you have gone through the trouble of updating all call sites - thanks !). -- To view, visit http://gerrit.cloudera.org:8080/4655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
