HeartSaVioR commented on PR #48297:
URL: https://github.com/apache/spark/pull/48297#issuecomment-2408305067

   The overall direction of watermark is to advance as fast as we see safe and 
not break the simplicity of current watermark model (there might be trade-off).
   
   I might not put the design discussion into JIRA ticket, but I got an input 
internally when I designed supporting multiple stateful operators - why not 
just advance watermark based on state watermark e.g. based on completed windows 
for window aggregation. This technically delays the advance of watermark by one 
batch "per operator", due to the mechanism of how we calculate and propagate 
watermark (at the planning rather than within microbatch). So we rejected it 
and tolerate some tricky situation like this.
   
   That said, the way we do is by design/intention. If you see the feedback 
from @andrzejzera who reported the correctness issue, he even said it's uneasy 
to intuitively follow the behavior because we delay producing output than it is 
theoretically possible to.
   https://lists.apache.org/thread/ysxmtqc1kycthnk0wjmts9sztkt1ofp2
   
   So further delaying to produce output does not sound to me as an option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to