HeartSaVioR edited a comment on pull request #33689: URL: https://github.com/apache/spark/pull/33689#issuecomment-896422196
I think the semantic is meaningful only when end users can store the output correctly. That said, we should evaluate the semantic in point of **end users' view**. They will evaluate whether they need to see the grouping key as `grouping key` vs `grouping key + session`. `grouping key + session start` is something Spark internally uses as state key, which end users wouldn't know, so no meaning in point of end users' view. If they leverage their knowledge about streaming aggregation, they will consider the key as `grouping key + session` (since they'll specify these things in `groupBy`) which I already demonstrated the problem. If they consider the key as `grouping key`, there's a chance for end users to upsert the session correctly, though only the last updated session will be stored, so it won't work with event time processing which there could be multiple active sessions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
