Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/13945#discussion_r68835399
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -0,0 +1,888 @@
+---
+layout: global
+displayTitle: Structured Streaming Programming Guide [Alpha]
+title: Structured Streaming Programming Guide
+---
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+# Overview
+Structured Streaming is a scalable and fault-tolerant stream processing
engine
+built on the Spark SQL engine. You can express your streaming computation
by
+thinking you are running a batch computation on a static dataset, and the
+Spark SQL engine takes care of running it incrementally and continuously
+updating the final result as streaming data keeps arriving. You can use
the
+[Dataset/DataFrame API](sql-programming-guide.html) in Scala, Java or
Python to express streaming
+aggregations, event-time windows, stream-to-batch joins, etc. The
computation
+is executed on the same optimized Spark SQL engine. Finally, the system
+ensures end-to-end exactly-once fault-tolerance guarantees through
--- End diff --
End-to-end exactly-once sounds like over-promising. Should probably define
what the ends are, because destructive outputs can't be literally exactly-once
in the face of network failures.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]