huaxingao commented on pull request #33584:
URL: https://github.com/apache/spark/pull/33584#issuecomment-892151806


   @gengliangwang 
   Thank you very much for taking a look!
   I asked @cloud-fan offline, and he was suggesting to separate the partition 
filters and data filters in data source implementation. I guess what we can do 
is to separate partition filters and data filters at the time of push down 
filter, and only return the data filters as post scan filters.
   
   ```
     override def pushFilters(filters: Array[Filter]): Array[Filter] = {
       separate partition filters and data filters
       this.filters = filters -- partition filters
       this.filters
     }
   ```
   
   There is a problem, though.  In the current implementation, 
`partitionFilters` and `dataFilters` in `FileScan` are `Expression`, and the 
partition pruning code takes  `partitionFilters` in the format of `Expression`. 
If I separate the filters in `pushFilters`, the partition filters and data 
filters are in the format of `sources.Filter`. The data filters can be 
reconstructed back to `Expression` in `PushDownUtils`, but I don't have a 
`translatedFilterToExpr` map inside ScanBuilder to reconstruct partition 
filters to `Expression`. Seems to me I can't use the current partition pruning 
code. 
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to