huaxingao commented on pull request #33650:
URL: https://github.com/apache/spark/pull/33650#issuecomment-903856178


   Agree to add `SupportsPushDownCatalystFilters` for pushing down catalyst 
`Expression` filters.
   Seems to me `pushFilters` is more suitable for sql based datasource:
   1. `pushFilters` returns filters that need to be evaluated after scanning. 
Only in sql based datasource, we need to return filters that need to be 
evaluated for post scan. In file source, we need to re-evaluate all the filters.
   2. `pushFilters` pushes `sources.Filter`. sql based datasource only needs 
the `sources.Filter`, and currently it has one copy of filters in the format of 
`sources.Filter`. But file sources currently have two copy of filters: one is 
in the format of `sources.Filter` which is pushed down in `pushFilters`,  and 
another is in the format of `Expression` which is pushed down in 
`PruneFileSourcePartitions`. Seems to me that it is more reasonable to push 
down once and maintain one copy. We have to push down in the format of 
`Expression` because this `Expression` is used for partition pruning.
   
   We are not changing the `pushFilters`, though. The users who implement this 
`pushFilters` in their file source can still use this `pushFilters` as what 
they do currently. I guess this will not break any of the current applications?
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to