HeartSaVioR edited a comment on pull request #31700:
URL: https://github.com/apache/spark/pull/31700#issuecomment-791149996


   Actually that's the one of few advantages from micro-batch compared to 
record-to-record, and we already leveraged it by some public API (e.g. 
flatMapGroupsWithState - this "sorts" the inputs in specific micro-batch so 
that values from the same group can be served sequentially). 
   
   That said, I'm supportive on the concept of the ordering, only for 
micro-batch. Dealing with sort in continuous mode is quite tricky - due to the 
nature of record-to-record processing, sort requires to buffer inputs into 
state or somewhere in memory until the epoch has been finished (we can maintain 
the state or buffer be kept to be sorted though), and downstream operations can 
only continue their works, which contradicts the fact that epoch is finished.
   
   My 2 cents on continuous mode is that we'd be better to admit the 
architectural differences between the batch oriented and streaming oriented, 
and try to have some safe approach to isolate between twos. Naturally 
integrating twos sounds very hard to achieve, and even has been playing as 
roadblock for improving functionalities on micro-batch mode as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to