HeartSaVioR opened a new pull request #33077:
URL: https://github.com/apache/spark/pull/33077


   Introduction: this PR is a part of SPARK-10816 (EventTime based 
sessionization (session window)). Please refer #31937 to see the overall view 
of the code change. (Note that code diff could be diverged a bit.)
   
   ### What changes were proposed in this pull request?
   
   This PR introduces MergingSortWithSessionWindowStateIterator, which does 
"merge sort" between input rows and sessions in state based on group key and 
session's start time. 
   
   Note that the iterator does merge sort among input rows and sessions grouped 
by grouping key. The iterator doesn't provide sessions in state which keys 
don't exist in input rows. For input rows, the iterator will provide all rows 
regardless of the existence of matching sessions in state.
   
   MergingSortWithSessionWindowStateIterator works on the precondition that 
given iterator is sorted by "group keys + start time of session window", and 
the iterator still retains the characteristic of the sort.
   
   ### Why are the changes needed?
   
   This part is a one of required on implementing SPARK-10816.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to