jerrypeng opened a new pull request, #56314:
URL: https://github.com/apache/spark/pull/56314
### What changes were proposed in this pull request?
This PR adds a new documentation page for **Real-time Mode** in Structured
Streaming, introduced in Spark 4.1.0 (SPARK-53736):
`docs/streaming/real-time-mode.md`. The page covers:
- **How Real-time Mode works**: long-running tasks (one per input
partition) that process records continuously, in contrast to the
per-micro-batch task scheduling of the default engine.
- **Batch duration is a checkpoint interval, not a latency target**.
- A **comparison** with the other execution modes.
- **Enabling Real-time Mode**: the `Trigger.RealTime(...)` API
(Scala/Java) and the `realTime` trigger keyword argument (Python), plus the
requirements to start (update output mode,
checkpoint location, minimum batch duration).
- **Supported queries** (stateless only), **fault tolerance**
(exactly-once processing semantics; sinks such as Kafka provide at-least-once
delivery), **configuration**, **examples**
(Python/Scala/Java), and **caveats**.
It also registers the new page in the Structured Streaming left navigation
(`docs/_data/menu-streaming.yaml`).
This is the first of two PRs. A follow-up PR will reference Real-time Mode
from the existing Structured Streaming pages (overview, the triggers reference
in the DataFrame/Dataset APIs
page, and the performance tips).
Real-time Mode (stateless) was added in Spark 4.1.0 but has no user-facing
documentation in the Structured Streaming programming guide. This PR adds that
page. See SPARK-57234.
### Does this PR introduce _any_ user-facing change?
No. This is a documentation-only change.
### How was this patch tested?
Documentation-only change. The new page was validated for structure (front
matter, code tabs, Liquid `{% highlight %}` tags), internal links and in-page
anchors, navigation-menu anchors,
and ASCII-only content. All trigger API signatures, configuration keys and
defaults, the supported-operator and sink lists, and error-class references
were cross-checked against the Spark
4.1.0 source (`Trigger.java`, `Triggers.scala`,
`RealTimeModeAllowlist.scala`, `SQLConf.scala`, `KafkaMicroBatchStream.scala`,
and `error-conditions.json`). Reviewers can verify rendering
locally with `SKIP_API=1 bundle exec jekyll build` from the `docs/`
directory.
### Was this patch authored or co-authored using generative AI tooling?
co-authored with Claude Code (Opus 4.8).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]