Re: [PR] [SPARK-55304][SS][PYTHON] Introduce support of Admission Control and Trigger.AvailableNow in Python data source - streaming reader [spark]

via GitHub Mon, 02 Feb 2026 22:54:59 -0800


huanliwang-db commented on code in PR #54085:
URL: https://github.com/apache/spark/pull/54085#discussion_r2757504719



##########
python/pyspark/sql/datasource.py:
##########
@@ -714,9 +715,37 @@ def initialOffset(self) -> dict:
             messageParameters={"feature": "initialOffset"},
         )
 
-    def latestOffset(self) -> dict:
+    def latestOffset(self, start: dict, limit: ReadLimit) -> dict:
         """
-        Returns the most recent offset available.
+        Returns the most recent offset available given a read limit. The start 
offset can be used
+        to figure out how much new data should be read given the limit.
+
+        The `start` will be provided from the return value of 
:meth:`initialOffset()` for
+        the very first micro-batch, and the offset continues from the last 
micro-batch for the
+        following. The source can return the same offset as start offset if 
there is no data to

Review Comment:
   maybe "and for subsequent micro-batches, the start offset is the ending 
offset from the previous micro-batch." is better iirc?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55304][SS][PYTHON] Introduce support of Admission Control and Trigger.AvailableNow in Python data source - streaming reader [spark]

Reply via email to