jingz-db commented on code in PR #48124:
URL: https://github.com/apache/spark/pull/48124#discussion_r1837410252
##########
python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py:
##########
@@ -241,11 +235,15 @@ def check_results(batch_df, _):
# test list state with ttl has the same behavior as list state when state
doesn't expire.
def test_transform_with_state_in_pandas_list_state_large_ttl(self):
- def check_results(batch_df, _):
- assert set(batch_df.sort("id").collect()) == {
- Row(id="0", countAsString="2"),
- Row(id="1", countAsString="2"),
- }
+ def check_results(batch_df, batch_id):
+ if batch_id == 0:
+ assert set(batch_df.sort("id").collect()) == {
+ Row(id="0", countAsString="2"),
+ Row(id="1", countAsString="2"),
+ }
+ else:
Review Comment:
Trigger.AvailableNow() will also keep making new batches... I had a
discussion with Bo previously here:
https://github.com/apache/spark/pull/48124#discussion_r1790883181
Not sure if we can do this in the in the `after()` as the query will keep
making new batches because `shouldRunAnotherBatch` will always return True for
processingTime mode. Scala side doesn't have the issue because we are using the
`testStream`, `CheckAnswer` framework and it will shut down the query
explicitly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]