HeartSaVioR opened a new pull request, #48570: URL: https://github.com/apache/spark/pull/48570
### What changes were proposed in this pull request? This PR proposes to use stable order of EventTimeWatermark node (instead of traversal order) to calculate watermark. ### Why are the changes needed? WatermarkTracker only looks at the physical plan during calculation of the new watermark value. It determines the watermark node by index, hence we have various issues when the watermark node is lost on the optimization phase. 1) watermark advancement is made even there is one node to be dropped (should be considered as no data from that node, hence should not advance the watermark) 2) watermark tracker incorrectly update the memory map of the previous value of watermark node (index is not a stable key, but used to update the map) New UT describes what is the expectation of the behavior and how it was broken before this PR. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New UT. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
