xuzikun2003 opened a new pull request #29725: URL: https://github.com/apache/spark/pull/29725
### What changes were proposed in this pull request? Spark SQL rank window function needs to sort the data in each window partition, and it relies on the execution operator SortExec to do the sort. During sorting, the window partition key is also put at the front of the sort order and thus it brings unnecessary comparisons on the partition key. Instead, we can group the rows by the partition key first, and inside each group we sort the rows without comparing the partition key. We use a HashMap to store the mapping between a partition key and a sorter. All the rows corresponding to a single partition key will be inserted into the same sorter. Each sorter will sort its rows. The partition keys stored in the HashMap will also be sorted at the end. When the sort operator is ready to return the rows to the window operator, it will follow the order of the partition key to go over each sorter, and each sorter will return the rows in the window order decided by the SQL syntax “ORDER BY”. As we cannot store an unlimited number of key-value pairs in the HashMap, we set an upper bound for the number of pairs. If the number of distinct keys in the HashMap reaches the limit, the new incoming rows will be inserted to the main sorter. This main sorter will sort the rows in the order of the partition key plus the window order. If the number of distinct keys in the HashMap is under the limit, the main sorter will be always empty. When there are two sequences of sorted rows in both the HashMap and the main sorter, we follow a merge sort to return the rows. We compare the next row ready to return from the HashMap and the next row ready to return from the main sorter, and always choose the one with a higher rank to return. ### Why are the changes needed? This is the related JIRA https://issues.apache.org/jira/browse/SPARK-32096?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17147504 This change brings performance improvement for window function. This is the change of performance when running q67 of TPCDS-1TB benchmark. Query | Time in seconds (master) | Time in seconds (perf patch) 67-v2.4 | 450.515 | 226.124 This is the change of performance when running q67 of TPCDS-10TB benchmark. Query | Time in seconds (master) | Time in seconds (perf patch) 67-v2.4 | 2486.404 | 1168.709 While this change brings performance improvement to query 67, it does not bring performance regression to other queries of TPCDS-1TB or TPCDS-10TB. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. existing unit tests 2. newly added unit tests ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
