[GitHub] [spark] Ngone51 commented on a change in pull request #28869: [SPARK-32031][SQL] Fix the wrong references of the PartialMerge/Final AggregateExpression

GitBox Sun, 21 Jun 2020 20:36:49 -0700


Ngone51 commented on a change in pull request #28869:
URL: https://github.com/apache/spark/pull/28869#discussion_r443302172




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala
##########
@@ -53,15 +53,31 @@ trait BaseAggregateExec extends UnaryExecNode {
       // can't bind the `mergeExpressions` with the output of the partial 
aggregate, as they use
       // the `inputAggBufferAttributes` of the original `DeclarativeAggregate` 
before copy. Instead,
       // we shall use `inputAggBufferAttributes` after copy to match the new 
`mergeExpressions`.
-      val aggAttrs = aggregateExpressions
-        // there're exactly four cases needs `inputAggBufferAttributes` from 
child according to the
-        // agg planning in `AggUtils`: Partial -> Final, PartialMerge -> Final,
-        // Partial -> PartialMerge, PartialMerge -> PartialMerge.
-        .filter(a => a.mode == Final || a.mode == 
PartialMerge).map(_.aggregateFunction)
-        .flatMap(_.inputAggBufferAttributes)
+      val aggAttrs = inputAggBufferAttributes
       child.output.dropRight(aggAttrs.length) ++ aggAttrs
     } else {
       child.output
     }
   }
+
+  private val inputAggBufferAttributes: Seq[Attribute] = {
+    aggregateExpressions
+      // there're exactly four cases needs `inputAggBufferAttributes` from 
child according to the
+      // agg planning in `AggUtils`: Partial -> Final, PartialMerge -> Final,
+      // Partial -> PartialMerge, PartialMerge -> PartialMerge.
+      .filter(a => a.mode == Final || a.mode == PartialMerge)
+      .flatMap(_.aggregateFunction.inputAggBufferAttributes)
+  }
+
+  protected val aggregateBufferAttributes: Seq[AttributeReference] = {
+    aggregateExpressions.flatMap(_.aggregateFunction.aggBufferAttributes)
+  }
+
+  override def producedAttributes: AttributeSet =
+    AttributeSet(aggregateAttributes) ++
+    
AttributeSet(resultExpressions.diff(groupingExpressions).map(_.toAttribute)) ++
+    AttributeSet(aggregateBufferAttributes) ++
+    // it's not empty when the inputAggBufferAttributes is from the child 
Aggregate, which contains
+    // subquery in AggregateFunction. See SPARK-31620 for more details.
+    AttributeSet(inputAggBufferAttributes.filterNot(child.output.contains))

Review comment:
       Oh..the comment should actually be:
   
   `it's not empty when the child Aggregate contains the subquery in 
AggregateFunction.`
   
   
   
   After SPARK-31620, the inputAggBufferAttributes is not from children but the 
node itself when there're Final/PartialMerge aggregate expressions, see 
`inputAttributes()` above. (But please note that agg attributes are still the 
same between the parent agg node and child agg node when there's no subquery in 
agg expression)
   
   
   Therefore, in the case of SPARK-31620, we actually use the attributes 
produced by the node itself. But for other cases, we still use the agg  buffer 
attributes from the children, since `inputAggBufferAttributes` is equal to the 
agg buffer attributes from the children and so 
`inputAggBufferAttributes.filterNot(child.output.contains)` is empty.
   
   
   
   
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on a change in pull request #28869: [SPARK-32031][SQL] Fix the wrong references of the PartialMerge/Final AggregateExpression

Reply via email to