[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

via GitHub Thu, 23 Feb 2023 19:47:17 -0800


HeartSaVioR commented on code in PR #39931:
URL: https://github.com/apache/spark/pull/39931#discussion_r1116477889



##########
sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala:
##########
@@ -463,6 +437,442 @@ class MultiStatefulOperatorsSuite
     )
   }
 
+  test("stream-stream time interval left outer join -> aggregation, append 
mode") {
+    val input1 = MemoryStream[(String, Timestamp)]
+    val input2 = MemoryStream[(String, Timestamp)]
+
+    val s1 = input1.toDF()
+      .selectExpr("_1 AS id1", "_2 AS timestamp1")
+      .withWatermark("timestamp1", "0 seconds")
+      .as("s1")
+
+    val s2 = input2.toDF()
+      .selectExpr("_1 AS id2", "_2 AS timestamp2")
+      .withWatermark("timestamp2", "0 seconds")
+      .as("s2")
+
+    val s3 = s1.join(s2, expr("s1.id1 = s2.id2 AND (s1.timestamp1 BETWEEN " +
+      "s2.timestamp2 - INTERVAL 1 hour AND s2.timestamp2 + INTERVAL 1 hour)"), 
"leftOuter")
+
+    val agg = s3.groupBy(window($"timestamp1", "10 minutes"))

Review Comment:
   I see more and more places which simply pick up the first occurrence of 
event time column. (Logic in stream-stream join also assumes there is only one 
event time column which I missed. Will add assertion.)
   
   That said, considering multiple event time columns for all stateful 
operators needs non-trivial reasoning. Let's leave this as a follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

Reply via email to