bogao007 commented on code in PR #48838:
URL: https://github.com/apache/spark/pull/48838#discussion_r1844724391
##########
python/pyspark/sql/streaming/stateful_processor.py:
##########
@@ -420,10 +411,27 @@ def handleInputRows(
timer_values: TimerValues
Timer value for the current batch that process the input
rows.
Users can get the processing or event time timestamp
from TimerValues.
+ """
+ return iter([])
Review Comment:
Why do we change the `...` placeholder here?
##########
python/pyspark/sql/streaming/stateful_processor.py:
##########
@@ -420,10 +411,27 @@ def handleInputRows(
timer_values: TimerValues
Timer value for the current batch that process the input
rows.
Users can get the processing or event time timestamp
from TimerValues.
+ """
+ return iter([])
+
+ def handleExpiredTimer(
Review Comment:
Just double check that this method is not required for users to implement,
correct?
##########
python/pyspark/sql/pandas/group_ops.py:
##########
@@ -573,7 +568,16 @@ def transformWithStateUDF(
statefulProcessorApiClient.set_handle_state(StatefulProcessorHandleState.CLOSED)
return iter([])
- result = handle_data_with_timers(statefulProcessorApiClient, key,
inputRows)
+ if timeMode != "none":
+ batch_timestamp =
statefulProcessorApiClient.get_batch_timestamp()
+ watermark_timestamp =
statefulProcessorApiClient.get_watermark_timestamp()
+ else:
+ batch_timestamp = -1
+ watermark_timestamp = -1
Review Comment:
Can we abstract this as a separate method and share in both UDFs to reduce
redundant code?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]