jingz-db commented on code in PR #48124:
URL: https://github.com/apache/spark/pull/48124#discussion_r1837410252


##########
python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py:
##########
@@ -241,11 +235,15 @@ def check_results(batch_df, _):
 
     # test list state with ttl has the same behavior as list state when state 
doesn't expire.
     def test_transform_with_state_in_pandas_list_state_large_ttl(self):
-        def check_results(batch_df, _):
-            assert set(batch_df.sort("id").collect()) == {
-                Row(id="0", countAsString="2"),
-                Row(id="1", countAsString="2"),
-            }
+        def check_results(batch_df, batch_id):
+            if batch_id == 0:
+                assert set(batch_df.sort("id").collect()) == {
+                    Row(id="0", countAsString="2"),
+                    Row(id="1", countAsString="2"),
+                }
+            else:

Review Comment:
   Trigger.AvailableNow() will also keep making new batches... I had a 
discussion with Bo previously here: 
https://github.com/apache/spark/pull/48124#discussion_r1790883181
   Not sure if we can do this in the in the `after()` as the query will keep 
making new batches because `shouldRunAnotherBatch` will always return True for 
processingTime mode. Scala side doesn't have the issue because we are using the 
`testStream`, `CheckAnswer` framework and it will shut down the query 
explicitly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to