brijrajk opened a new pull request, #56473: URL: https://github.com/apache/spark/pull/56473
### What changes were proposed in this pull request? Three documentation fixes in the PySpark streaming data source API: **1. Fix docstring bug in `DataSourceStreamReader.latestOffset()` (`datasource.py:759`)** `limit.maxRows` → `limit.max_rows` The `ReadMaxRows` dataclass uses Python snake_case `max_rows`. Users copying this example would get `AttributeError: 'ReadMaxRows' object has no attribute 'maxRows'` at runtime. **2. Update outdated `latestOffset` signature in tutorial (`python_data_source.rst`)** `def latestOffset(self) -> dict:` → `def latestOffset(self, start: dict, limit: ReadLimit) -> dict:` The parameterless signature is deprecated since SPARK-55304. The tutorial should guide new users toward the recommended signature that supports admission control. Type annotation for `limit` added per reviewer feedback on the prior PR (#55227). **3. Add `Trigger.AvailableNow` documentation section (`python_data_source.rst`)** New section showing how to implement `SupportsTriggerAvailableNow` for finite batch processing — how `prepareForTriggerAvailableNow()` captures the target offset at query start and how `latestOffset()` should respect it to ensure the query terminates. ### Why are the changes needed? - Fix 1: Runtime bug — users copying the docstring will get `AttributeError` - Fix 2: Tutorial teaches deprecated API instead of recommended approach - Fix 3: `Trigger.AvailableNow` support was undiscoverable — no tutorial guidance existed These issues were originally identified during review of SPARK-55450 and tracked in SPARK-56367. ### Does this PR introduce _any_ user-facing change? No. Documentation and docstring fixes only. ### How was this patch tested? No code change — documentation only. Verified: - `ReadMaxRows` dataclass uses `max_rows` field name - `SupportsTriggerAvailableNow` and `prepareForTriggerAvailableNow()` exist in `python/pyspark/sql/streaming/datasource.py` - `latestOffset(self, start, limit)` is the recommended signature per SPARK-55304 ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
