Re: [PR] [SPARK-49676][SS][PYTHON] Add Support for Chaining of Operators in transformWithStateInPandas API [spark]

via GitHub Wed, 30 Oct 2024 14:51:59 -0700


jingz-db commented on code in PR #48124:
URL: https://github.com/apache/spark/pull/48124#discussion_r1823433041



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/python/TransformWithStateInPandasExec.scala:
##########
@@ -106,6 +107,37 @@ case class TransformWithStateInPandasExec(
     List.empty
   }
 
+  override def shouldRunAnotherBatch(newInputWatermark: Long): Boolean = {
+    if (timeMode == ProcessingTime) {
+      // TODO: check if we can return true only if actual timers are 
registered, or there is

Review Comment:
   Confirmed with Anish - the way we have it today is that we'll keep 
constructing new batches because we may have future timer expiring. I sent out 
a SPARK JIRA to keep track of this issue: 
https://issues.apache.org/jira/browse/SPARK-50180 and will update the comments 
in both Scala and Python.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49676][SS][PYTHON] Add Support for Chaining of Operators in transformWithStateInPandas API [spark]

Reply via email to