[GitHub] [spark] sunchao commented on a change in pull request #29498: [SPARK-32674][DOC] Add suggestion for parallel directory listing in tuning doc

GitBox Thu, 20 Aug 2020 23:57:39 -0700


sunchao commented on a change in pull request #29498:
URL: https://github.com/apache/spark/pull/29498#discussion_r474444907




##########
File path: docs/tuning.md
##########
@@ -264,12 +264,16 @@ parent RDD's number of partitions. You can pass the level 
of parallelism as a se
 or set the config property `spark.default.parallelism` to change the default.
 In general, we recommend 2-3 tasks per CPU core in your cluster.
 
+# Parallel Listing on Input Paths

Review comment:
       Good catch. Fixing.

##########
File path: docs/sql-performance-tuning.md
##########
@@ -114,6 +114,28 @@ that these options will be deprecated in future release as 
more optimizations ar
     </td>
     <td>1.1.0</td>
   </tr>
+  <tr>
+    
<td><code>spark.sql.sources.parallelPartitionDiscovery.threshold</code></td>
+    <td>32</td>
+    <td>
+      Configures the threshold to enable parallel listing for job input paths. 
If the number of
+      input paths is larger than this threshold, Spark will use parallel 
listing on the driver side.

Review comment:
       Will do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #29498: [SPARK-32674][DOC] Add suggestion for parallel directory listing in tuning doc

Reply via email to