jingz-db commented on code in PR #48124:
URL: https://github.com/apache/spark/pull/48124#discussion_r1823433041
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/python/TransformWithStateInPandasExec.scala:
##########
@@ -106,6 +107,37 @@ case class TransformWithStateInPandasExec(
List.empty
}
+ override def shouldRunAnotherBatch(newInputWatermark: Long): Boolean = {
+ if (timeMode == ProcessingTime) {
+ // TODO: check if we can return true only if actual timers are
registered, or there is
Review Comment:
Confirmed with Anish - the way we have it today is that we'll keep
constructing new batches because we may have future timer expiring. I sent out
a SPARK JIRA to keep track of this issue:
https://issues.apache.org/jira/browse/SPARK-50180 and will update the comments
in both Scala and Python.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]