HeartSaVioR commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark URL: https://github.com/apache/spark/pull/24936#issuecomment-505334052 I'm now correcting myself that we shouldn't filter out late rows in new physical node, since there're some cases where input rows are later than watermark but they are still counting in aggregation, like non-window streaming aggregation. It should just only count the number of late events, which doesn't always mean they will be discarded, but they could have a chance to be discarded. So it's going to be less intuitive than what I intended for the first time, It will be still helpful to identify the issue on #24890, as we mostly don't want to let intermediate outputs being late on watermark, having chance to be discarded.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
