HeartSaVioR commented on a change in pull request #28422: URL: https://github.com/apache/spark/pull/28422#discussion_r418571213
########## File path: docs/structured-streaming-programming-guide.md ########## @@ -542,6 +542,12 @@ Here are the details of all the sources in Spark. <br/> <code>maxFileAge</code>: Maximum age of a file that can be found in this directory, before it is ignored. For the first batch all files will be considered valid. If <code>latestFirst</code> is set to `true` and <code>maxFilesPerTrigger</code> is set, then this parameter will be ignored, because old files that are valid, and should be processed, may be ignored. The max age is specified with respect to the timestamp of the latest file, and not the timestamp of the current system.(default: 1 week) <br/> + <code>inputRetention</code>: Maximum age of a file that can be found in this directory, before it is ignored.<br/> + This is the "hard" limit of input data retention - input files older than the max age will be ignored regardless of source options (while `maxFileAgeMs` depends on the condition), as well as entries in checkpoint metadata will be purged based on this.<br/> + Unlike `maxFileAgeMs`, the max age is specified with respect to the timestamp of the current system, to provide consistent behavior regardless of metadata entries.<br/> + NOTE 1: Please be careful to set the value if the query replays from the old input files.<br/> + NOTE 2: Please make sure the timestamp is in sync between nodes which run the query.<br/> + <br/> Review comment: Looks like the kinds of values weren't specified in many options, but implied by default values. This option doesn't have default value - maybe better to explicitly specify kind of value. Good point! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org