vanzin commented on a change in pull request #23560: [SPARK-26632][Core]
Separate Thread Configurations of Driver and Executor
URL: https://github.com/apache/spark/pull/23560#discussion_r281297348
##########
File path: docs/configuration.md
##########
@@ -1954,6 +1954,46 @@ Apart from these, the following properties are also
available, and may be useful
</tr>
</table>
+### Thread Configurations
+
+Depending on jobs and cluster configurations, we can set number of threads in
several places in Spark to utilize
+available resources efficiently to get better performance. Prior to Spark 3.0,
these thread configurations apply
+to all roles of Spark, such as driver, executor, worker and master. From Spark
3.0, we can configure threads in
+finer granularity starting from driver and executor. Take RPC module as
example in below table. For other modules,
+like shuffle, just replace "rpc" with "shuffle" in the property names except
+<code>spark.{driver|executor}.rpc.netty.dispatcher.numThreads</code>, which is
only for RPC module.
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+<tr>
+ <td><code>spark.{driver|executor}.rpc.io.serverThreads</code></td>
+ <td>
+ Fall back on spark.rpc.io.serverThreads
+ </td>
+ <td>Number of threads used in the server thread pool</td>
+</tr>
+<tr>
+ <td><code>spark.{driver|executor}.rpc.io.clientThreads</code></td>
+ <td>
+ Fall back on spark.rpc.io.clientThreads
+ </td>
+ <td>Number of threads used in the client thread pool</td>
+</tr>
+<tr>
+ <td><code>spark.{driver|executor}.rpc.netty.dispatcher.numThreads</code></td>
+ <td>
+ Fall back on spark.rpc.netty.dispatcher.numThreads
+ </td>
+ <td>Number of threads used in RPC message dispatcher thread pool</td>
+</tr>
+</table>
+
+The default values of spark.rpc.io.serverThreads, spark.rpc.io.clientThreads
and spark.rpc.netty.dispatcher.numThreads
+are same. It's <br>
+number of CPU cores if specified. Otherwise, the available processors to the
JVM. In either cases, the default value
Review comment:
This whole paragraph in fact is a little hard to read, and shouldn't
reference private APIs in Spark. Better wording:
```
The default value for number of thread-related config keys is the minimum of
the number of cores requested for the driver or executor, or, in the absence of
that value, the number of cores available for the JVM (with a hardcoded upper
limit of 8).
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]