maryannxue opened a new pull request #25600: [SPARK-28888][SQL] Dynamic 
Partition Pruning
URL: https://github.com/apache/spark/pull/25600
 
 
   ### What changes were proposed in this pull request?
   This patch implements dynamic partition pruning by adding a 
dynamic-partition-pruning filter if there is a partitioned table and a filter 
on the dimension table. The filter is then planned using a heuristic approach:
   1. As a broadcast relation if it is a broadcast hash join. The broadcast 
relation will then be transformed into a reused broadcast exchange by the 
`ReuseExchange` rule; or
   2. As a subquery duplicate if the estimated benefit of partition table scan 
being saved is greater than the estimated cost of the extra scan of the 
duplicated subquery; otherwise
   3. As a bypassed condition (`true`).
   
   
   ### Why are the changes needed?
   This is an important performance feature.
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   Added UT
   - Testing DPP by enabling / disabling the reuse broadcast results feature 
and / or the subquery duplication feature.
   - Testing DPP with reused broadcast results.
   - Testing the key iterators on different HashedRelation types.
   - Testing the packing and unpacking of the broadcast keys in a LongType.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to