[GitHub] [spark] beliefer opened a new pull request, #42317: [SPARK-44649][SQL] Runtime Filter supports passing equivalent creation side expressions

via GitHub Thu, 03 Aug 2023 01:49:02 -0700


beliefer opened a new pull request, #42317:
URL: https://github.com/apache/spark/pull/42317


   ### What changes were proposed in this pull request?
   Currently, Spark runtime filter supports multi level shuffle join side as 
filter creation side. Please see: https://github.com/apache/spark/pull/39170. 
Although this feature adds the adaptive scene and improves the performance, 
there are still need to support other case.
   Let me show the SQL below.
   ```
   SELECT *
   FROM (
     SELECT *
     FROM tab1
       JOIN tab2 ON tab1.c1 = tab2.c2
     WHERE bf2.a2 = 5
   ) AS a
     JOIN tab3 ON tab3.c3 = a.c1
   ```
   For the current implementation, Spark only inject runtime filter into tab1 
with bloom filter based on `bf2.a2 = 5`.
   Because there is no the join between tab3 and tab2, so Spark can't inject 
runtime filter into tab3 with the same bloom filter.
   But the above SQL have the join condition `tab3.c3 = a.c1(tab1.c1)` between 
tab3 and tab2, and also have the join condition `tab1.c1 = tab2.c2`. We can 
rely on the transitivity of the join condition to get the virtual join 
condition `tab3.c3 = tab2.c2`, then we can inject the bloom filter based on 
`bf2.a2 = 5` into tab3.
   
   ### Why are the changes needed?
   Enhance the Spark runtime filter and improve performance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Just update the inner implementation.
   
   
   ### How was this patch tested?
   New tests.
   Micro benchmark for q75 in TPC-DS.
   **2TB TPC-DS**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] beliefer opened a new pull request, #42317: [SPARK-44649][SQL] Runtime Filter supports passing equivalent creation side expressions

Reply via email to