[GitHub] [spark] beliefer opened a new pull request, #39170: [WIP][SPARK-41674][SQL] Runtime filter should supports the any side of child join as filter creation side

GitBox Wed, 21 Dec 2022 20:31:23 -0800


beliefer opened a new pull request, #39170:
URL: https://github.com/apache/spark/pull/39170


   ### What changes were proposed in this pull request?
   Currently, Spark runtime filter supports two scenes.
   1. When the Join itself is a Shuffle Join;
   2. When the join itself is a broadcast join and there is a Shuffle Join 
under one end of the join.3. 
   
   This PR want let runtime filter supports the third scene.
   To facilitate understanding, the below SQL is taken as an example.
   ```
   SELECT *
   FROM bf1
       JOIN bf2
       JOIN bf3
       ON bf1.c1 = bf2.c2
           AND bf3.c3 = bf2.c2
   WHERE bf2.a2 = 5
   ```
   The current only add runtime filter(subquery on bf2) on bf1 (the first 
scene). In fact, we can apply the runtime filter(subquery on bf2) on bf3 too.
   
   
   ### Why are the changes needed?
   1. Improve the supported scene for runtime filter 
   2. Reduct the data size for shuffle and improve the performance.
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Just update the inner implementation.
   
   
   ### How was this patch tested?
   New tests.
   Micro benchmark.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] beliefer opened a new pull request, #39170: [WIP][SPARK-41674][SQL] Runtime filter should supports the any side of child join as filter creation side

Reply via email to