dongjoon-hyun opened a new pull request #28377:
URL: https://github.com/apache/spark/pull/28377


   Credit to @LiangchangZ and @xuanyuanking , this PR reuses the UT as well as 
integrate test in #24457. Thanks Liangchang for your solid work.
   
   ### What changes were proposed in this pull request?
   Make metadata propagatable between Aliases.
   
   ### Why are the changes needed?
   In Structured Streaming, we added an Alias for TimeWindow by default.
   
https://github.com/apache/spark/blob/590b9a0132b68d9523e663997def957b2e46dfb1/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3272-L3273
   For some cases like stream join with watermark and window, users need to add 
an alias for convenience(we also added one in StreamingJoinSuite). The current 
metadata handling logic for `as` will lose the watermark metadata
   
https://github.com/apache/spark/blob/590b9a0132b68d9523e663997def957b2e46dfb1/sql/core/src/main/scala/org/apache/spark/sql/Column.scala#L1049-L1054
    and finally cause the AnalysisException: 
   ```
   Stream-stream outer join between two streaming DataFrame/Datasets is not 
supported without a watermark in the join keys, or a watermark on the nullable 
side and an appropriate range condition
   ```
   
   
   ### Does this PR introduce any user-facing change?
   Bugfix for an alias on time window with watermark.
   
   ### How was this patch tested?
   New UTs added. One for the functionality and one for explaining the common 
scenario.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to