[PR] [SPARK-57234][SS][DOCS] Add Real-time Mode documentation page to the Structured Streaming guide [spark]

via GitHub Wed, 03 Jun 2026 18:16:56 -0700


jerrypeng opened a new pull request, #56314:
URL: https://github.com/apache/spark/pull/56314


   ### What changes were proposed in this pull request?
   
     This PR adds a new documentation page for **Real-time Mode** in Structured 
Streaming, introduced in Spark 4.1.0 (SPARK-53736): 
`docs/streaming/real-time-mode.md`. The page covers:
   
     - **How Real-time Mode works**: long-running tasks (one per input 
partition) that process records continuously, in contrast to the 
per-micro-batch task scheduling of the default engine.
     - **Batch duration is a checkpoint interval, not a latency target**.
     - A **comparison** with the other execution modes.
     - **Enabling Real-time Mode**: the `Trigger.RealTime(...)` API 
(Scala/Java) and the `realTime` trigger keyword argument (Python), plus the 
requirements to start (update output mode,
     checkpoint location, minimum batch duration).
     - **Supported queries** (stateless only), **fault tolerance** 
(exactly-once processing semantics; sinks such as Kafka provide at-least-once 
delivery), **configuration**, **examples**
     (Python/Scala/Java), and **caveats**.
   
     It also registers the new page in the Structured Streaming left navigation 
(`docs/_data/menu-streaming.yaml`).
   
     This is the first of two PRs. A follow-up PR will reference Real-time Mode 
from the existing Structured Streaming pages (overview, the triggers reference 
in the DataFrame/Dataset APIs
     page, and the performance tips).
   
     Real-time Mode (stateless) was added in Spark 4.1.0 but has no user-facing 
documentation in the Structured Streaming programming guide. This PR adds that 
page. See SPARK-57234.
   
     ### Does this PR introduce _any_ user-facing change?
   
     No. This is a documentation-only change.
   
     ### How was this patch tested?
   
     Documentation-only change. The new page was validated for structure (front 
matter, code tabs, Liquid `{% highlight %}` tags), internal links and in-page 
anchors, navigation-menu anchors,
     and ASCII-only content. All trigger API signatures, configuration keys and 
defaults, the supported-operator and sink lists, and error-class references 
were cross-checked against the Spark
     4.1.0 source (`Trigger.java`, `Triggers.scala`, 
`RealTimeModeAllowlist.scala`, `SQLConf.scala`, `KafkaMicroBatchStream.scala`, 
and `error-conditions.json`). Reviewers can verify rendering
     locally with `SKIP_API=1 bundle exec jekyll build` from the `docs/` 
directory.
   
     ### Was this patch authored or co-authored using generative AI tooling?
   
    co-authored with Claude Code (Opus 4.8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-57234][SS][DOCS] Add Real-time Mode documentation page to the Structured Streaming guide [spark]

Reply via email to