Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22952#discussion_r231704852
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -530,6 +530,8 @@ Here are the details of all the sources in Spark.
"s3://a/dataset.txt"<br/>
"s3n://a/b/dataset.txt"<br/>
"s3a://a/b/c/dataset.txt"<br/>
+ <br/>
+ <code>renameCompletedFiles</code>: whether to rename completed
files in previous batch (default: false). If the option is enabled, input file
will be renamed with additional postfix "_COMPLETED_". This is useful to clean
up old input files to save space in storage.
--- End diff --
The essential thing should be slow. Without any written notice, the users
will complain again and again due to the performance regression. Frequently,
the users don't say they changed this kind of setting. Instead, they say Spark
suddenly shows regressions in their environment.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]