HyukjinKwon commented on a change in pull request #28422:
URL: https://github.com/apache/spark/pull/28422#discussion_r418381154
##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -542,6 +542,12 @@ Here are the details of all the sources in Spark.
<br/>
<code>maxFileAge</code>: Maximum age of a file that can be found in
this directory, before it is ignored. For the first batch all files will be
considered valid. If <code>latestFirst</code> is set to `true` and
<code>maxFilesPerTrigger</code> is set, then this parameter will be ignored,
because old files that are valid, and should be processed, may be ignored. The
max age is specified with respect to the timestamp of the latest file, and not
the timestamp of the current system.(default: 1 week)
<br/>
+ <code>inputRetention</code>: Maximum age of a file that can be found
in this directory, before it is ignored.<br/>
+ This is the "hard" limit of input data retention - input files older
than the max age will be ignored regardless of source options (while
`maxFileAgeMs` depends on the condition), as well as entries in checkpoint
metadata will be purged based on this.<br/>
+ Unlike `maxFileAgeMs`, the max age is specified with respect to the
timestamp of the current system, to provide consistent behavior regardless of
metadata entries.<br/>
+ NOTE 1: Please be careful to set the value if the query replays from
the old input files.<br/>
+ NOTE 2: Please make sure the timestamp is in sync between nodes which
run the query.<br/>
+ <br/>
Review comment:
Out of curiosity, did we document what kind of value is expected for
these options, e.g., `7d`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]