cloud-fan opened a new pull request #26341: [SPARK-29277][SQL] Add early DSv2 
filter and projection pushdown
URL: https://github.com/apache/spark/pull/26341
 
 
   Bring back https://github.com/apache/spark/pull/25955
   
   ### What changes were proposed in this pull request?
   
   This adds a new rule, `V2ScanRelationPushDown`, to push filters and 
projections in to a new `DataSourceV2ScanRelation` in the optimizer. That scan 
is then used when converting to a physical scan node. The new relation 
correctly reports stats based on the scan.
   
   To run scan pushdown before rules where stats are used, this adds a new 
optimizer override, `earlyScanPushDownRules` and a batch for early pushdown in 
the optimizer, before cost-based join reordering. The other early pushdown 
rule, `PruneFileSourcePartitions`, is moved into the early pushdown rule set.
   
   This also moves pushdown helper methods from `DataSourceV2Strategy` into a 
util class.
   
   ### Why are the changes needed?
   
   This is needed for DSv2 sources to supply stats for cost-based rules in the 
optimizer.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   This updates the implementation of stats from `DataSourceV2Relation` so 
tests will fail if stats are accessed before early pushdown for v2 relations.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to