VindhyaG commented on code in PR #53040:
URL: https://github.com/apache/spark/pull/53040#discussion_r2540570386


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -2430,6 +2430,28 @@ object SQLConf {
     .checkValue(v => v > 0, "The maximum number of partitions must be a 
positive integer.")
     .createOptional
 
+  val FILES_PARTITION_STRATEGY = buildConf("spark.sql.files.partitionStrategy")
+    .doc("The strategy to coalesce small files into larger partitions when 
reading files. " +
+      "Options are `size_based` (coalesce based on size of files), and 
`file_based` "
+        + "(coalesce based on number of files). The number of output 
partitions depends on " +
+      "`spark.sql.files.maxPartitionBytes` and 
`spark.sql.files.maxPartitionNum`. " +
+      "This configuration is effective only when using file-based sources such 
as " +
+      "Parquet, JSON and ORC.")
+    .version("3.5.0")
+    .stringConf
+    .checkValues(Set("size_based", "file_based"))
+    .createWithDefault("size_based")
+
+  val SMALL_FILE_THRESHOLD =
+  buildConf("spark.sql.files.smallFileThreshold")
+    .doc(
+      "Defines the total size threshold for small files in a table scan. If 
the cumulative size " +
+        "of small files falls below this threshold, they are distributed 
across multiple " +
+        "partitions to avoid concentrating them in a single partition. This 
configuration is " +
+    "used when `spark.sql.files.coalesceStrategy` is set to `file_based`.")
+    .doubleConf
+    .createWithDefault(0.5)

Review Comment:
   may I ask how did we arrive with 0.5 as the default smallfilethreshold?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to