Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22952#discussion_r231704852
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -530,6 +530,8 @@ Here are the details of all the sources in Spark.
             "s3://a/dataset.txt"<br/>
             "s3n://a/b/dataset.txt"<br/>
             "s3a://a/b/c/dataset.txt"<br/>
    +        <br/>
    +        <code>renameCompletedFiles</code>: whether to rename completed 
files in previous batch (default: false). If the option is enabled, input file 
will be renamed with additional postfix "_COMPLETED_". This is useful to clean 
up old input files to save space in storage.
    --- End diff --
    
    The essential thing should be slow. Without any written notice, the users 
will complain again and again due to the performance regression. Frequently, 
the users don't say they changed this kind of setting. Instead, they say Spark 
suddenly shows regressions in their environment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to