[GitHub] [spark] viirya commented on a change in pull request #29498: [SPARK-32674][DOC] Add suggestion for parallel directory listing in tuning doc

GitBox Thu, 20 Aug 2020 23:51:54 -0700


viirya commented on a change in pull request #29498:
URL: https://github.com/apache/spark/pull/29498#discussion_r474442266




##########
File path: docs/sql-performance-tuning.md
##########
@@ -114,6 +114,28 @@ that these options will be deprecated in future release as 
more optimizations ar
     </td>
     <td>1.1.0</td>
   </tr>
+  <tr>
+    
<td><code>spark.sql.sources.parallelPartitionDiscovery.threshold</code></td>
+    <td>32</td>
+    <td>
+      Configures the threshold to enable parallel listing for job input paths. 
If the number of
+      input paths is larger than this threshold, Spark will use parallel 
listing on the driver side.

Review comment:
       > Spark will use parallel listing on the driver side.
   
   It sounds like Spark just runs parallel listing on the driver side using 
multi-threading. How about using "Spark will list the files by using Spark 
distributed job"?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #29498: [SPARK-32674][DOC] Add suggestion for parallel directory listing in tuning doc

Reply via email to