WangGuangxin opened a new pull request #27558: [SPARK-30806][SQL]Evaluate once 
per group in UnboundedWindowFunctionFrame
URL: https://github.com/apache/spark/pull/27558
 
 
   ### What changes were proposed in this pull request?
   We only need to do aggregate evaluation once per group in 
`UnboundedWindowFunctionFrame`
   
   ### Why are the changes needed?
   Currently, in `UnboundedWindowFunctionFrame.write`,it re-evaluate the 
processor for each row in a group, which is not necessary in fact which I'll 
address later. It hurts performance when the evaluation is time-consuming (for 
example, Percentile's eval need to sort its buffer and do some calculation). In 
our production, there is a percentile with window operation sql,  it costs more 
than 10 hours in SparkSQL while 10min in Hive.
   
   In fact, `UnboundedWindowFunctionFrame` can be treated as 
`SlidingWindowFunctionFrame` with `lbound = UnboundedPreceding` and `ubound = 
UnboundedFollowing`, just as its comments. In that case, 
`SlidingWindowFunctionFrame` also only do evaluation once for each group.
   
   ### Does this PR introduce any user-facing change?
   NO
   
   ### How was this patch tested?
   Existing UT
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to