[GitHub] [spark] beliefer opened a new pull request, #38745: [WIP][SPARK-37099][SQL] Optimize the filter based on rank-like window function by reduce not required rows

GitBox Mon, 21 Nov 2022 06:08:54 -0800


beliefer opened a new pull request, #38745:
URL: https://github.com/apache/spark/pull/38745


   ### What changes were proposed in this pull request?
   Sometimes, the SQL exists filter which condition compares rank-like window 
functions with number. For example,
   ```
   SELECT *,
            ROW_NUMBER() OVER(ORDER BY a) AS rn
   FROM Tab1
   WHERE rn <= 5
   ```
   We can create a `Limit(5)` and push down it as the child of `Window`.
   ```
   SELECT *,
            ROW_NUMBER() OVER(ORDER BY a) AS rn
   FROM 
       (SELECT *
       FROM Tab1
       ORDER BY  a LIMIT 5) t
   ```
   
   In short, it supports following pattern:
   ```
   SELECT (... (row_number|rank|dense_rank)()
       OVER (
   ORDER BY  ... ) AS rn)
   WHERE rn (==|<|<=) k
           AND other conditions
   ```
   For these three rank functions (row_number|rank|dense_rank), the rank of a 
key computed on dataset always <= its total rows of whole dataset，so we can 
safely discard rows with rank > k, anywhere.
   
   This PR also take over some functions from 
https://github.com/apache/spark/pull/34367.
   
   
   ### Why are the changes needed?
   Improve the performance.
   
   **Micro Benchmark**
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Just update the inner implementation.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] beliefer opened a new pull request, #38745: [WIP][SPARK-37099][SQL] Optimize the filter based on rank-like window function by reduce not required rows

Reply via email to