jingz-db commented on code in PR #49560:
URL: https://github.com/apache/spark/pull/49560#discussion_r1953457708
##########
python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py:
##########
@@ -499,28 +488,14 @@ def check_results(batch_df, batch_id):
Row(id="0", countAsString="-1"),
Row(id="1", countAsString="3"),
}
- self.first_expired_timestamp = batch_df.filter(
- batch_df["countAsString"] == -1
- ).first()["timeValues"]
check_timestamp(batch_df)
- elif batch_id == 2:
+ else:
assert set(batch_df.sort("id").select("id",
"countAsString").collect()) == {
Row(id="0", countAsString="3"),
Row(id="0", countAsString="-1"),
Row(id="1", countAsString="5"),
}
- # The expired timestamp in current batch is larger than expiry
timestamp in batch 1
Review Comment:
This check was removed after connect suites are added. It is good to have
additional check but not necessary. We need to access
`self.first_expired_timestamp` which is not doable for spark suite. This check
is mostly checking the expired timer timestamp in batch 1 is smaller than
timestamp of expired timer in batch 2.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]