echauchot commented on issue #23576: [SPARK-26655] [SS] Support multiple aggregates in append mode URL: https://github.com/apache/spark/pull/23576#issuecomment-525638792 > > Regarding output mode, most of Beam runners (spark for ex) support discarding output in which element from different windows are independent and previous states are dropped. > > I'm not sure I understand it correctly. The point for Append mode is, output for specific key (key shouldn't be necessary to be windowed, but should include "event time" column) will be provided only once in any case (orthogonal to fault tolerance, and doesn't mean "exactly-once" here), regardless of allowed lateness, no case of "upsert". If Beam doesn't close the window when watermark passes by (but still doesn't pass by allowed lateness) but triggers window and emits the output of window so far (so output could be emitted multiple times), it's not compatible with Spark's Append mode. > Beam does not trigger output unless the watermark pass the end of window + allowed lateness. There is no triggering between end of window and allowed lateness. Close and output is at the same time. > stream-stream join should decide which "event time" should be taken even we change the way of storing event time, as there're two rows being joined. How Beam decides "event time" for new record from two records? In column based event time (current Spark), it should be hard to choose "min" or "max" of event time, as which column to pick as event time should be decided by query plan phase. Ah I thought we were talking about watermark. For choosing the event timestamp, Beam uses a TimestampCombiner which default policy is to set the resulting timestamp to the end of the window for new record.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
