Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/10453#discussion_r49137355
--- Diff: docs/streaming-programming-guide.md ---
@@ -1985,7 +1985,11 @@ To run a Spark Streaming applications, you need to
have the following.
to increase aggregate throughput. Additionally, it is recommended that
the replication of the
received data within Spark be disabled when the write ahead log is
enabled as the log is already
stored in a replicated storage system. This can be done by setting the
storage level for the
- input stream to `StorageLevel.MEMORY_AND_DISK_SER`.
+ input stream to `StorageLevel.MEMORY_AND_DISK_SER`. While using S3 (or
any file system that
+ does not support flushing) for Write Ahead Logs, please remember to
enable
--- End diff --
nit: Write Ahead Logs is not in caps in this text. so please be consistent.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]