WweiL commented on code in PR #38503:
URL: https://github.com/apache/spark/pull/38503#discussion_r1017447320
##########
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala:
##########
@@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends
StateStoreMetricsTest {
.agg(sum("num"))
.as[(String, Long)]
- testStream(result, Update)(
- AddData(inputData, "a" -> 1),
- CheckLastBatch("a" -> 1L),
- assertNumStateRows(total = Seq(1L, 1L), updated = Seq(1L, 1L)),
- AddData(inputData, "a" -> 1), // Dropped
- CheckLastBatch(),
- assertNumStateRows(total = Seq(1L, 1L), updated = Seq(0L, 0L)),
- AddData(inputData, "a" -> 2),
- CheckLastBatch("a" -> 3L),
- assertNumStateRows(total = Seq(1L, 2L), updated = Seq(1L, 1L)),
- AddData(inputData, "b" -> 1),
- CheckLastBatch("b" -> 1L),
- assertNumStateRows(total = Seq(2L, 3L), updated = Seq(1L, 1L))
- )
+ // As of [SPARK-40940], multiple state operator with Complete mode is
disabled by default
Review Comment:
I see. No problem at all!
Just to confirm, in the above suite,
sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsInPandasWithStateSuite.scala,
there is a query which is flatMapGroupsWithState followed by an agg in
complete mode, it also seems to be supported before. But here after the change
it throws because there are two stateful ops in complete mode. Should we also
allow this case? I'm not sure here because I remember you mentioned
flatMapGroupsWithState could change the eventtime so we should disallow... So
is the test wrong..?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]