Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3653#discussion_r21652907
--- Diff: docs/streaming-flume-integration.md ---
@@ -66,9 +66,16 @@ configuring Flume agents.
## Approach 2 (Experimental): Pull-based Approach using a Custom Sink
Instead of Flume pushing data directly to Spark Streaming, this approach
runs a custom Flume sink that allows the following.
+
- Flume pushes data into the sink, and the data stays buffered.
-- Spark Streaming uses transactions to pull data from the sink.
Transactions succeed only after data is received and replicated by Spark
Streaming.
-This ensures that better reliability and fault-tolerance than the previous
approach. However, this requires configuring Flume to run a custom sink. Here
are the configuration steps.
+- Spark Streaming uses a [reliable Flume
receiver](streaming-programming-guide.html#receiver-reliability)
+ and transactions to pull data from the sink. Transactions succeed only
after data is received and
+ replicated by Spark Streaming.
+
+This ensures that stronger reliability and
--- End diff --
Can cut 'that'
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]