xuzikun2003 opened a new pull request #29725:
URL: https://github.com/apache/spark/pull/29725


   
   ### What changes were proposed in this pull request?
   Spark SQL rank window function needs to sort the data in each window 
partition, and it relies on the execution operator SortExec to do the sort. 
During sorting, the window partition key is also put at the front of the sort 
order and thus it brings unnecessary comparisons on the partition key. Instead, 
we can group the rows by the partition key first, and inside each group we sort 
the rows without comparing the partition key.​
   
   We use a HashMap to store the mapping between a partition key and a sorter. 
All the rows corresponding to a single partition key will be inserted into the 
same sorter. Each sorter will sort its rows. The partition keys stored in the 
HashMap will also be sorted at the end. When the sort operator is ready to 
return the rows to the window operator, it will follow the order of the 
partition key to go over each sorter, and each sorter will return the rows in 
the window order decided by the SQL syntax “ORDER BY”. 
   
   As we cannot store an unlimited number of key-value pairs in the HashMap, we 
set an upper bound for the number of pairs. If the number of distinct keys in 
the HashMap reaches the limit, the new incoming rows will be inserted to the 
main sorter. This main sorter will sort the rows in the order of the partition 
key plus the window order. If the number of distinct keys in the HashMap is 
under the limit, the main sorter will be always empty. 
   
   When there are two sequences of sorted rows in both the HashMap and the main 
sorter, we follow a merge sort to return the rows. We compare the next row 
ready to return from the HashMap and the next row ready to return from the main 
sorter, and always choose the one with a higher rank to return. 
   
   
   ### Why are the changes needed?
   This is the related JIRA
   
https://issues.apache.org/jira/browse/SPARK-32096?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17147504
   
   This change brings performance improvement for window function. This is the 
change of performance when running q67 of TPCDS-1TB benchmark.
   Query    | Time in seconds (master) | Time in seconds (perf patch)
   67-v2.4  | 450.515                            | 226.124
   This is the change of performance when running q67 of TPCDS-10TB benchmark.
   Query    | Time in seconds (master) | Time in seconds (perf patch)
   67-v2.4  | 2486.404                          | 1168.709
   While this change brings performance improvement to query 67,  it does not 
bring performance regression to other queries of TPCDS-1TB or TPCDS-10TB.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   1. existing unit tests
   2. newly added unit tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to