HeartSaVioR edited a comment on pull request #33689:
URL: https://github.com/apache/spark/pull/33689#issuecomment-896422196


   I think the semantic is meaningful only when end users can store the output 
correctly. That said, we should evaluate the semantic in point of **end users' 
view**. They will evaluate whether they need to see the grouping key as 
`grouping key` vs `grouping key + session`. `grouping key + session start` is 
something Spark internally uses as state key, which end users wouldn't know, so 
no meaning in point of end users' view.
   
   If they leverage their knowledge about streaming aggregation, they will 
consider the key as `grouping key + session` (since they'll specify these 
things in `groupBy`) which I already demonstrated the problem.
   
   If they consider the key as `grouping key`, there's a chance for end users 
to upsert the session correctly, though only the last updated session will be 
stored, so it won't work with event time processing which there could be 
multiple active sessions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to