huaxingao commented on pull request #33584:
URL: https://github.com/apache/spark/pull/33584#issuecomment-892151806
@gengliangwang
Thank you very much for taking a look!
I asked @cloud-fan offline, and he was suggesting to separate the partition
filters and data filters in data source implementation. I guess what we can do
is to separate partition filters and data filters at the time of push down
filter, and only return the data filters as post scan filters.
```
override def pushFilters(filters: Array[Filter]): Array[Filter] = {
separate partition filters and data filters
this.filters = filters -- partition filters
this.filters
}
```
There is a problem, though. In the current implementation,
`partitionFilters` and `dataFilters` in `FileScan` are `Expression`, and the
partition pruning code takes `partitionFilters` in the format of `Expression`.
If I separate the filters in `pushFilters`, the partition filters and data
filters are in the format of `sources.Filter`. The data filters can be
reconstructed back to `Expression` in `PushDownUtils`, but I don't have a
`translatedFilterToExpr` map inside ScanBuilder to reconstruct partition
filters to `Expression`. Seems to me I can't use the current partition pruning
code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]