viirya commented on pull request #31570:
URL: https://github.com/apache/spark/pull/31570#issuecomment-803912957


   Thanks for re-evaluating two approaches. It is valuable.
   
   Basically by leveraging the new state store format, two previous efforts are 
now pretty close, except for how they handle session merging.
   
   No worry. The precondition to picking the simpler approach, is that two 
approaches have similar performance. I remember this was claimed in the JIRA. 
Re-evaluation gives us a different number.
   
   I ran the benchmark locally. Due to the difference of machines, I cannot get 
the same numbers but I can see there is significant difference between two 
approaches, i.e., 1) merging then aggregating, 2) merging with aggregating.
    
   I think we have a few options.
   
   1. Replace with merging with aggregating (`MergingSessionsIterator`). I'm 
doing it locally to see if we can get a similar number. It'd be good too 
@HeartSaVioR would like to create PR against this. So it is easier to 
incorporate authored commits from all parties. It is also fine if @HeartSaVioR 
wants to work on it after merging this.
   2. Switch to the other previous effort + new state store format.
   
   Either works for me. Actually two options are basically the same logic to 
me, except for some cosmetic difference.
   
   @xuanyuanking WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to