[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

GitBox Tue, 21 Dec 2021 05:22:59 -0800


cloud-fan commented on a change in pull request #34904:
URL: https://github.com/apache/spark/pull/34904#discussion_r773131123




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -189,6 +204,13 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
       }
   }
 
+  private def newAggOutput(aggAttribute: AttributeReference, agg: 
AggregateExpression) =
+    if (aggAttribute.dataType == agg.resultAttribute.dataType) {
+      aggAttribute
+    } else {
+      Cast(aggAttribute, agg.resultAttribute.dataType)

Review comment:
       I think complete and partial pushdown are different here.
   
   For complete pushdown, we should cast to the data type of the aggregate 
function.
   For partial pushdown, Spark will run aggregate again, so we should cast to 
the data type of the input of the aggregate function, so that the final data 
type is still the same as before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #34904: [SPARK-37644][SQL] Support datasource v2 complete aggregate pushdown

Reply via email to