huaxingao commented on pull request #33584: URL: https://github.com/apache/spark/pull/33584#issuecomment-893088473
@gengliangwang Thanks a lot for taking a look at this problem. I need the postScanFilter at this line https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L78 to be empty to push down aggregate. In JDBC, this postScanFilter is un-pushed filters + un-translated filters In file based data source, currently this postScanFilter is data filters + partition filters + un-translated filters My goal is to make file based data source's postScanFilter to be data filters + untranslated filters. If this postScanFilter is empty, I can push down aggregate. e.g. `SELECT count(*) FROM t WHERE part_col = 1 ` can be pushed down. I can add duplicated code to resolve the partition filters, but I will need to remove the partition filters from postScanFilters. Since these partition filters are removed here, at the time of calling `PruneFileSourcePartitions`, we don't have partition filters any more and nothing needs to be done there. If we don't want to touch `PruneFileSourcePartitions`, I guess instead of checking if the postScanFilter is empty using `if filters.isEmpty`, we do something like this: ``` if (JDBC) if filters.isEmpty push down aggregate else // file based if filters are only partition filters push down aggregate ``` This looks hacky though. Please let me know if anybody has a better idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
