[GitHub] [spark] pan3793 commented on a diff in pull request #41545: [SPARK-44021][SQL] Add spark.sql.files.maxDesiredPartitionNum

via GitHub Sun, 11 Jun 2023 20:33:09 -0700


pan3793 commented on code in PR #41545:
URL: https://github.com/apache/spark/pull/41545#discussion_r1226049672



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -1749,6 +1749,19 @@ object SQLConf {
     .checkValue(v => v > 0, "The min partition number must be a positive 
integer.")
     .createOptional
 
+  val FILES_MAX_DESIRED_PARTITION_NUM = 
buildConf("spark.sql.files.maxDesiredPartitionNum")
+    .doc("The maximum desired number of partitions when reading files. When 
the number of " +
+      "partitions calculated for the first time is greater than this value, 
recalculate " +
+      s"${FILES_MAX_PARTITION_BYTES.key} so that the final number of 
partitions is close to this " +
+      "value. Note that the final calculated number of partitions may be 
larger than this value." +

Review Comment:
   Since the doc is exposed to the end users, I think we should just emphasize 
the priority between this one and `spark.sql.files.maxPartitionBytes` instead 
of exposing the internal implementation algorithm. To avoid confusing user, 
maybe we should also update the doc of `spark.sql.files.maxPartitionBytes` to 
mention it is not guaranteed when this one is set.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pan3793 commented on a diff in pull request #41545: [SPARK-44021][SQL] Add spark.sql.files.maxDesiredPartitionNum

Reply via email to