viirya commented on a change in pull request #29498:
URL: https://github.com/apache/spark/pull/29498#discussion_r474442266



##########
File path: docs/sql-performance-tuning.md
##########
@@ -114,6 +114,28 @@ that these options will be deprecated in future release as 
more optimizations ar
     </td>
     <td>1.1.0</td>
   </tr>
+  <tr>
+    
<td><code>spark.sql.sources.parallelPartitionDiscovery.threshold</code></td>
+    <td>32</td>
+    <td>
+      Configures the threshold to enable parallel listing for job input paths. 
If the number of
+      input paths is larger than this threshold, Spark will use parallel 
listing on the driver side.

Review comment:
       > Spark will use parallel listing on the driver side.
   
   It sounds like Spark just runs parallel listing on the driver side using 
multi-threading. How about using "Spark will list the files by using Spark 
distributed job"?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to