Kimahriman commented on code in PR #47393:
URL: https://github.com/apache/spark/pull/47393#discussion_r1681854029


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -2129,6 +2129,13 @@ object SQLConf {
     .intConf
     .createWithDefault(100)
 
+  val MIN_VERSIONS_TO_DELETE = 
buildConf("spark.sql.streaming.minVersionsToDelete")
+    .internal()
+    .doc("The minimum number of stale versions to delete when maintenance is 
invoked.")
+    .version("2.1.1")
+    .intConf
+    .createWithDefault(30)

Review Comment:
   As an HDFS user where I care more about number of files than number of list 
calls, this would have been a little surprising and confusing to be the new 
default behavior when I upgraded. Should this default to 1? That way by default 
you at least don't waste time looking for files to delete if you haven't even 
written a new version since the last time you checked, but still respects 
`minBatchesToRetain` as a tighter bound



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to