HeartSaVioR opened a new pull request, #43971:
URL: https://github.com/apache/spark/pull/43971

   ### What changes were proposed in this pull request?
   
   This PR proposes to move out EliminateEventTimeWatermark to the analyzer 
(one of the analysis rule), and also make a change to eliminate 
EventTimeWatermark node only when the child of EventTimeWatermark is "resolved".
   
   ### Why are the changes needed?
   
   Currently, we apply EliminateEventTimeWatermark immediately when 
withWatermark is called, which means the rule is applied immediately against 
the child, regardless whether child is resolved or not.
   
   It is not an issue for the usage of DataFrame API initiated by read / 
readStream, because streaming sources have the flag isStreaming set to true 
even it is yet resolved. But mix-up of SQL and DataFrame API would expose the 
issue; we may not know the exact value of isStreaming flag on unresolved node 
and it is subject to change upon resolution.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UTs.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to