WweiL commented on code in PR #38503:
URL: https://github.com/apache/spark/pull/38503#discussion_r1021870148


##########
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala:
##########
@@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends 
StateStoreMetricsTest {
       .agg(sum("num"))
       .as[(String, Long)]
 
-    testStream(result, Update)(
-      AddData(inputData, "a" -> 1),
-      CheckLastBatch("a" -> 1L),
-      assertNumStateRows(total = Seq(1L, 1L), updated = Seq(1L, 1L)),
-      AddData(inputData, "a" -> 1), // Dropped
-      CheckLastBatch(),
-      assertNumStateRows(total = Seq(1L, 1L), updated = Seq(0L, 0L)),
-      AddData(inputData, "a" -> 2),
-      CheckLastBatch("a" -> 3L),
-      assertNumStateRows(total = Seq(1L, 2L), updated = Seq(1L, 1L)),
-      AddData(inputData, "b" -> 1),
-      CheckLastBatch("b" -> 1L),
-      assertNumStateRows(total = Seq(2L, 3L), updated = Seq(1L, 1L))
-    )
+    // As of [SPARK-40940], multiple state operator with Complete mode is 
disabled by default

Review Comment:
   I've made a list, let's discuss this later.
   
   - In Complete, Update mode, Aggregations followed by any stateful op are 
disallowed
   - Dedup: don't count, has no effect no matter what stateful op and output 
mode is.
   - stream-stream join: 
     - only allowed in append mode, inner join with equality.
     - Outer join with equality and time-interval join are disallowed.
     - [?] Other than that, don't need to check its compatibility with other 
stateful ops.
   - flatMapGroupsWithState (and mapGroupWithState, also pandas version): 
     - Currently: `MapGroupsWithState` with aggregation is disallowed 
     - Currently: `MapGroupsWithState` only allowed in Update mode
     - [?] After this PR: `MapGroupsWithState` what?
     - Currently: `flatMapGroupsWithState`'s output mode must match query 
output mode if no aggs -> [keep this behavior] 
     - Currently: `flatMapGroupsWithState` with agg (no matter before or after 
it) in Update mode is not allowed -> [keep this behavior]
     - Currently: agg followed by `flatMapGroupsWithState` in Append mode is 
disallowed -> [change this behavior]
     - After this PR: agg followed by `flatMapGroupsWithState` in Append mode 
is allowed. 
     - After this PR: `flatMapGroupsWithState` followed by any stateful 
operator is disallowed.
   
   [?] Why Dedup doesn't require event-time col? It should create some kind of 
state store to do the deduplication, if no watermark are we holding these 
states throughout the query?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to