viirya commented on a change in pull request #20965: [SPARK-21870][SQL] Split
aggregation code into small functions
URL: https://github.com/apache/spark/pull/20965#discussion_r319696722
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
##########
@@ -824,59 +936,158 @@ case class HashAggregateExec(
// generating input columns, we use `currentVars`.
ctx.currentVars = new Array[ExprCode](aggregateBufferAttributes.length) ++
input
+ val aggNames = aggregateExpressions.map(_.aggregateFunction.prettyName)
+ // Computes start offsets for each aggregation function code
+ // in the underlying buffer row.
+ val bufferStartOffsets = {
+ val offsets = mutable.ArrayBuffer[Int]()
+ var curOffset = 0
+ updateExprs.foreach { exprsForOneFunc =>
+ offsets += curOffset
+ curOffset += exprsForOneFunc.length
+ }
+ offsets.toArray
+ }
+
val updateRowInRegularHashMap: String = {
ctx.INPUT_ROW = unsafeRowBuffer
- val boundUpdateExpr = bindReferences(updateExpr, inputAttr)
- val subExprs =
ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
+ val boundUpdateExprs = updateExprs.map { updateExprsForOneFunc =>
+ bindReferences(updateExprsForOneFunc, inputAttr)
+ }
+ val subExprs =
ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExprs.flatten)
val effectiveCodes = subExprs.codes.mkString("\n")
- val unsafeRowBufferEvals =
ctx.withSubExprEliminationExprs(subExprs.states) {
- boundUpdateExpr.map(_.genCode(ctx))
+ val unsafeRowBufferEvals = boundUpdateExprs.map {
boundUpdateExprsForOneFunc =>
+ ctx.withSubExprEliminationExprs(subExprs.states) {
Review comment:
If I remember correctly, we will eliminate common sub exprs in the update
exprs. Do the common sub exprs need to be put in split function arguments?
For example, will the following happen?
```
int sub1 = some_common_sub_expr(..);
call_split_func_for_sum(current_row, arg1, arg2);
int call_split_func_for_a(current_row, arg1, arg2) {
int eval_sum = arg1 + arg2 + sub1;
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]