WangGuangxin opened a new pull request #27558: [SPARK-30806][SQL]Evaluate once per group in UnboundedWindowFunctionFrame URL: https://github.com/apache/spark/pull/27558 ### What changes were proposed in this pull request? We only need to do aggregate evaluation once per group in `UnboundedWindowFunctionFrame` ### Why are the changes needed? Currently, in `UnboundedWindowFunctionFrame.write`,it re-evaluate the processor for each row in a group, which is not necessary in fact which I'll address later. It hurts performance when the evaluation is time-consuming (for example, Percentile's eval need to sort its buffer and do some calculation). In our production, there is a percentile with window operation sql, it costs more than 10 hours in SparkSQL while 10min in Hive. In fact, `UnboundedWindowFunctionFrame` can be treated as `SlidingWindowFunctionFrame` with `lbound = UnboundedPreceding` and `ubound = UnboundedFollowing`, just as its comments. In that case, `SlidingWindowFunctionFrame` also only do evaluation once for each group. ### Does this PR introduce any user-facing change? NO ### How was this patch tested? Existing UT
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
