huaxingao commented on pull request #33650:
URL: https://github.com/apache/spark/pull/33650#issuecomment-897732830
@cloud-fan
> I think the key problem we should fix is: file source v2 should not return
partition filters as the "post scan" filters, in the implementation of
SupportsPushDownFilters.pushFilters
Agree.
Seems to me there are two ways to fix this:
1. in SupportsPushDownFilters.pushFilters, separate the partition filter and
data filter. Something like this:
```
def pushFilters = {
separate the partition filter and data filter
set both filter on ScanBuilder so these can be passed to V2Scan at
construction time
return data filter as post scan filter
}
```
The problem with this approach is that not all data source implement
SupportsPushDownFilters. For the data source that doesn't have `pushFilters`,
we will need a different path to separate the partition filter and data filter.
2. separate the partition filter and data filter before calling
SupportsPushDownFilters.pushFilters, and set these two types of filters on
ScanBuilder. The filter passed to SupportsPushDownFilters.pushFilters only
contains data filter.
I tried the first approach, and then changed to the second one.
BTW, I had a PR to push down only data filter to `ORCScan`
https://github.com/apache/spark/pull/33680
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]