[GitHub] [spark] maropu commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.

GitBox Sat, 23 Nov 2019 01:09:18 -0800

maropu commented on a change in pull request #26420: [SPARK-27986][SQL] Support 
ANSI SQL filter predicate for aggregate expression.
URL: https://github.com/apache/spark/pull/26420#discussion_r349865117


 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala
 ##########
 @@ -157,38 +180,89 @@ abstract class AggregationIterator(
       inputAttributes: Seq[Attribute]): (InternalRow, InternalRow) => Unit = {
     val joinedRow = new JoinedRow
     if (expressions.nonEmpty) {
-      val mergeExpressions = functions.zip(expressions).flatMap {
-        case (ae: DeclarativeAggregate, expression) =>
-          expression.mode match {
+      val filterExpressions = expressions.map(_.filter)
+      var isFinalOrMerge = false
+      val mergeExpressions = functions.zipWithIndex.collect {
+        case (ae: DeclarativeAggregate, i) =>
+          expressions(i).mode match {
             case Partial | Complete => ae.updateExpressions
-            case PartialMerge | Final => ae.mergeExpressions
+            case PartialMerge | Final =>
+              isFinalOrMerge = true
+              ae.mergeExpressions
           }
         case (agg: AggregateFunction, _) => 
Seq.fill(agg.aggBufferAttributes.length)(NoOp)
       }
       val updateFunctions = functions.zipWithIndex.collect {
         case (ae: ImperativeAggregate, i) =>
           expressions(i).mode match {
-            case Partial | Complete =>
+            case Partial | Complete if filterExpressions(i).isDefined =>
+              (buffer: InternalRow, row: InternalRow) =>
+                if (predicates(i).eval(row)) { ae.update(buffer, row) }
+            case Partial | Complete if filterExpressions(i).isEmpty =>
               (buffer: InternalRow, row: InternalRow) => ae.update(buffer, row)
             case PartialMerge | Final =>
               (buffer: InternalRow, row: InternalRow) => ae.merge(buffer, row)
           }
       }.toArray
       // This projection is used to merge buffer values for all 
expression-based aggregates.
       val aggregationBufferSchema = functions.flatMap(_.aggBufferAttributes)
-      val updateProjection =
-        newMutableProjection(mergeExpressions, aggregationBufferSchema ++ 
inputAttributes)
+      val updateProjection = newMutableProjection(
+        mergeExpressions.flatMap(_.seq), aggregationBufferSchema ++ 
inputAttributes)
 
-      (currentBuffer: InternalRow, row: InternalRow) => {
-        // Process all expression-based aggregate functions.
-        updateProjection.target(currentBuffer)(joinedRow(currentBuffer, row))
+      val processImperative = (currentBuffer: InternalRow, row: InternalRow) 
=> {
         // Process all imperative aggregate functions.
         var i = 0
         while (i < updateFunctions.length) {
           updateFunctions(i)(currentBuffer, row)
           i += 1
         }
       }
+
+      // The following two situations will adopt a common implementation:
+      // First, no filter predicate is specified for any aggregate expression.
+      // Second, aggregate expressions are in merge or final mode.
+      if (predicates.isEmpty || isFinalOrMerge) {
+        (currentBuffer: InternalRow, row: InternalRow) => {
+          updateProjection.target(currentBuffer)(joinedRow(currentBuffer, row))
+          processImperative(currentBuffer, row)
 
 Review comment:
   I'm a bit worrid that this cloure can cause some performance overhead when 
processing regular non-filter aggregate functions. cc: @cloud-fan 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.

Reply via email to