[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

via GitHub Thu, 23 Feb 2023 01:14:48 -0800


HeartSaVioR commented on code in PR #39931:
URL: https://github.com/apache/spark/pull/39931#discussion_r1115405490



##########
sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala:
##########
@@ -463,6 +437,442 @@ class MultiStatefulOperatorsSuite
     )
   }
 
+  test("stream-stream time interval left outer join -> aggregation, append 
mode") {
+    val input1 = MemoryStream[(String, Timestamp)]
+    val input2 = MemoryStream[(String, Timestamp)]
+
+    val s1 = input1.toDF()
+      .selectExpr("_1 AS id1", "_2 AS timestamp1")
+      .withWatermark("timestamp1", "0 seconds")
+      .as("s1")
+
+    val s2 = input2.toDF()
+      .selectExpr("_1 AS id2", "_2 AS timestamp2")
+      .withWatermark("timestamp2", "0 seconds")
+      .as("s2")
+
+    val s3 = s1.join(s2, expr("s1.id1 = s2.id2 AND (s1.timestamp1 BETWEEN " +
+      "s2.timestamp2 - INTERVAL 1 hour AND s2.timestamp2 + INTERVAL 1 hour)"), 
"leftOuter")
+
+    val agg = s3.groupBy(window($"timestamp1", "10 minutes"))

Review Comment:
   I've been thinking through this again... I might be probably overlooked.
   
   My thought was that if there are two event time columns, the row should wait 
for the watermark to pass by two values (W > et1 && W > et2) so that further 
input won't match with the row. But it seems like the condition is OR, not AND. 
If either W > et1 or W > et2 happens, further input won't match with the row, 
as W is the lower bound of both columns.
   
   I probably need to revisit this tomorrow again (ideal), or postpone to the 
next JIRA ticket. If you can help me finding some test/use case which shows 
expected behavior, that should be awesome.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39931: [SPARK-42376][SS] Introduce watermark propagation among operators

Reply via email to