[GitHub] [spark] HeartSaVioR edited a comment on pull request #31700: [SPARK-34183][SS] DataSource V2: Support required distribution and ordering in SS

GitBox Thu, 04 Mar 2021 20:50:55 -0800


HeartSaVioR edited a comment on pull request #31700:
URL: https://github.com/apache/spark/pull/31700#issuecomment-791149996



   Actually that's the one of few advantages from micro-batch compared to 
record-to-record, and we already leveraged it by some public API (e.g. 
flatMapGroupsWithState - this "sorts" the inputs in specific micro-batch so 
that values from the same group can be served to the user func sequentially 
wrapped with iterator. Imagine how it could be done without sorting.)
   
   That said, I'm supportive on the concept of the ordering, only for 
micro-batch. Dealing with sort in continuous mode is quite tricky - due to the 
nature of record-to-record processing, sort requires to buffer inputs into 
state or somewhere in memory until the epoch has been finished (we can maintain 
the state or buffer be kept to be sorted though), and downstream operations can 
only continue their works after that, which contradicts the fact that epoch is 
finished.
   
   My 2 cents on continuous mode is that we'd be better to admit the 
architectural differences between the batch oriented and streaming oriented, 
and try to have some safe approach to isolate between twos. Naturally 
integrating twos sounds very hard to achieve, and even has been playing as 
roadblock for improving functionalities on micro-batch mode as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR edited a comment on pull request #31700: [SPARK-34183][SS] DataSource V2: Support required distribution and ordering in SS

Reply via email to