[GitHub] [spark] huaxingao commented on pull request #33584: [SPARK-36351][SQL] Separate partition filters and data filters in PushDownUtils

GitBox Wed, 04 Aug 2021 18:17:50 -0700


huaxingao commented on pull request #33584:
URL: https://github.com/apache/spark/pull/33584#issuecomment-893088473

@gengliangwang Thanks a lot for taking a look at this problem.

I need the postScanFilter at this line
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L78
to be empty to push down aggregate.

In JDBC, this postScanFilter is un-pushed filters + un-translated filters
In file based data source, currently this postScanFilter is data filters +
partition filters + un-translated filters

My goal is to make file based data source's postScanFilter to be data
filters + untranslated filters. If this postScanFilter is empty, I can push
down aggregate. e.g. `SELECT count(*) FROM t WHERE part_col = 1 ` can be pushed
down.

I can add duplicated code to resolve the partition filters, but I will need
to remove the partition filters from postScanFilters. Since these partition
filters are removed here, at the time of calling `PruneFileSourcePartitions`,
we don't have partition filters any more and nothing needs to be done there.

If we don't want to touch `PruneFileSourcePartitions`, I guess instead of
checking if the postScanFilter is empty using `if filters.isEmpty`, we do
something like this:
```
if (JDBC)
if filters.isEmpty
push down aggregate
else // file based
if filters are only partition filters
push down aggregate
```
This looks hacky though. Please let me know if anybody has a better idea.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on pull request #33584: [SPARK-36351][SQL] Separate partition filters and data filters in PushDownUtils

Reply via email to