HeartSaVioR commented on issue #23576: [SPARK-26655] [SS] Support multiple 
aggregates in append mode
URL: https://github.com/apache/spark/pull/23576#issuecomment-525217328
 
 
   > Regarding output mode, most of Beam runners (spark for ex) support 
discarding output in which element from different windows are independent and 
previous states are dropped.
   
   I'm not sure I understand it correctly. The point for Append mode is, output 
for specific key (key shouldn't be necessary to be windowed, but should include 
"event time" column) will be provided only once in any case, regardless of 
allowed lateness, no case of "upsert". If Beam doesn't close the window when 
watermark passes by (but still doesn't pass by allowed lateness) but triggers 
window and emits the output of window so far (so output could be emitted 
multiple times), it's not compatible with Spark's Append mode.
   
   stream-stream join should decide which "event time" should be taken even we 
change the way of storing event time, as there're two rows being joined. How 
Beam decides "event time" for new record from two records? In column based 
event time (current Spark), it should be hard to choose "min" or "max" of event 
time, as which column to pick as event time should be determined by query plan 
phase.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to