[GitHub] [spark] huaxingao commented on a change in pull request #34498: [SPARK-37220][SQL] Do not split input file for Parquet reader with aggregate push down

GitBox Fri, 05 Nov 2021 20:21:32 -0700


huaxingao commented on a change in pull request #34498:
URL: https://github.com/apache/spark/pull/34498#discussion_r744063753




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##########
@@ -151,9 +151,10 @@ case class RowDataSourceScanExec(
 
     Map(
       "ReadSchema" -> requiredSchema.catalogString,
-      "PushedFilters" -> seqToString(markedFilters.toSeq),
-      "PushedAggregates" -> aggString,
-      "PushedGroupby" -> groupByString) ++
+      "PushedFilters" -> seqToString(markedFilters.toSeq)) ++
+      pushedDownOperators.aggregation.fold(Map[String, String]()) { v =>

Review comment:
       The change looks good. I plan to fix this but didn't have a chance to do 
it yet. One problem, though. In the tests I check if the aggregate pushes down 
by checking the keywords in explain. If the explain contains 
`PushedAggregation: []`, then the aggregate is not pushed down. Since now we 
don't have the `[]`, some of the tests will fail. We probably should change all 
the tests to use a different method (collect aggregates in optimizedPlan) to 
check if aggregate push down. Probably do this in a separate PR.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on a change in pull request #34498: [SPARK-37220][SQL] Do not split input file for Parquet reader with aggregate push down

Reply via email to