Kimahriman commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681854029
########## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ########## @@ -2129,6 +2129,13 @@ object SQLConf { .intConf .createWithDefault(100) + val MIN_VERSIONS_TO_DELETE = buildConf("spark.sql.streaming.minVersionsToDelete") + .internal() + .doc("The minimum number of stale versions to delete when maintenance is invoked.") + .version("2.1.1") + .intConf + .createWithDefault(30) Review Comment: As an HDFS user where I care more about number of files than number of list calls, this would have been a little surprising and confusing to be the new default behavior when I upgraded. Should this default to 1? That way by default you at least don't waste time looking for files to delete if you haven't even written a new version since the last time you checked, but still respects `minBatchesToRetain` as a tighter bound -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org