xuanyuanking opened a new pull request #28326:
URL: https://github.com/apache/spark/pull/28326


   Credit to @LiangchangZ, this PR reuses the UT as well as integrate test in 
#24457. Thanks for the solid work.
   
   ### What changes were proposed in this pull request?
   Add the specific logic in CleanupAliases rule for keeping the event time 
watermark metadata in the top-level alias.
   
   ### Why are the changes needed?
   In Structured Streaming, we added an Alias for TimeWindow by default.
   
https://github.com/apache/spark/blob/590b9a0132b68d9523e663997def957b2e46dfb1/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3272-L3273
   For some cases like stream join with watermark and window, users need to add 
an alias for convenience(we also added one in StreamingJoinSuite). The current 
metadata handling logic for `as` will lose the watermark metadata
   
https://github.com/apache/spark/blob/590b9a0132b68d9523e663997def957b2e46dfb1/sql/core/src/main/scala/org/apache/spark/sql/Column.scala#L1049-L1054
    and finally cause the AnalysisException: 
   ```
   Stream-stream outer join between two streaming DataFrame/Datasets is not 
supported without a watermark in the join keys, or a watermark on the nullable 
side and an appropriate range condition
   ```
   
   
   ### Does this PR introduce any user-facing change?
   Bugfix for an alias on time window with watermark.
   
   ### How was this patch tested?
   New UTs added. One for the functionality and one for explaining the common 
scenario.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to