HeartSaVioR commented on issue #24936: [SPARK-24634][SS] Add a new metric 
regarding number of rows later than watermark
URL: https://github.com/apache/spark/pull/24936#issuecomment-505334052
 
 
   I'm now correcting myself that we shouldn't filter out late rows in new 
physical node, since there're some cases where input rows are later than 
watermark but they are still counting in aggregation, like non-window streaming 
aggregation.
   
   It should just only count the number of late events, which doesn't always 
mean they will be discarded, but they could have a chance to be discarded.
   
   So it's going to be less intuitive than what I intended for the first time, 
It will be still helpful to identify the issue on #24890, as we mostly don't 
want to let intermediate outputs being late on watermark, having chance to be 
discarded.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to