huanliwang-db commented on code in PR #54085:
URL: https://github.com/apache/spark/pull/54085#discussion_r2757524251
##########
python/pyspark/sql/datasource.py:
##########
@@ -714,9 +715,37 @@ def initialOffset(self) -> dict:
messageParameters={"feature": "initialOffset"},
)
- def latestOffset(self) -> dict:
+ def latestOffset(self, start: dict, limit: ReadLimit) -> dict:
"""
- Returns the most recent offset available.
+ Returns the most recent offset available given a read limit. The start
offset can be used
+ to figure out how much new data should be read given the limit.
+
+ The `start` will be provided from the return value of
:meth:`initialOffset()` for
+ the very first micro-batch, and the offset continues from the last
micro-batch for the
+ following. The source can return the same offset as start offset if
there is no data to
+ process.
+
+ :class:`ReadLimit` can be used by the source to limit the amount of
data returned in this
+ call. The implementation should implement
:meth:`getDefaultReadLimit()` to provide the
+ proper :class:`ReadLimit` if the source can limit the amount of data
returned based on the
+ source options.
+
+ The engine can still call :meth:`latestOffset()` with
:class:`ReadAllAvailable` even if the
+ source produces the different read limit from
:meth:`getDefaultReadLimit()`, to respect the
+ semantic of trigger. The source must always respect the given
readLimit provided by the
+ engine; e.g. if the readLimit is :class:`ReadAllAvailable`, the source
must ignore the read
+ limit configured through options.
Review Comment:
nit: maybe it will be more clear if we can provide an example in which the
engine provides a different read limit than the configured one
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]