Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/880#discussion_r13110259
--- Diff: docs/configuration.md ---
@@ -601,91 +626,59 @@ Apart from these, the following properties are also
available, and may be useful
</td>
</tr>
<tr>
- <td><code>spark.logConf</code></td>
- <td>false</td>
- <td>
- Whether to log the supplied SparkConf as INFO at start of spark
context.
- </td>
-</tr>
-<tr>
- <td><code>spark.eventLog.enabled</code></td>
- <td>false</td>
- <td>
- Whether to log spark events, useful for reconstructing the Web UI
after the application has
- finished.
- </td>
-</tr>
-<tr>
- <td><code>spark.eventLog.compress</code></td>
- <td>false</td>
- <td>
- Whether to compress logged events, if
<code>spark.eventLog.enabled</code> is true.
- </td>
-</tr>
-<tr>
- <td><code>spark.eventLog.dir</code></td>
- <td>file:///tmp/spark-events</td>
- <td>
- Base directory in which spark events are logged, if
<code>spark.eventLog.enabled</code> is true.
- Within this base directory, Spark creates a sub-directory for each
application, and logs the
- events specific to the application in this directory.
- </td>
-</tr>
-<tr>
- <td><code>spark.deploy.spreadOut</code></td>
- <td>true</td>
+ <td><code>spark.locality.wait</code></td>
+ <td>3000</td>
<td>
- Whether the standalone cluster manager should spread applications out
across nodes or try to
- consolidate them onto as few nodes as possible. Spreading out is
usually better for data
- locality in HDFS, but consolidating is more efficient for
compute-intensive workloads. <br/>
- <b>Note:</b> this setting needs to be configured in the standalone
cluster master, not in
- individual applications; you can set it through
<code>SPARK_MASTER_OPTS</code> in
- <code>spark-env.sh</code>.
+ Number of milliseconds to wait to launch a data-local task before
giving up and launching it
+ on a less-local node. The same wait will be used to step through
multiple locality levels
+ (process-local, node-local, rack-local and then any). It is also
possible to customize the
+ waiting time for each level by setting
<code>spark.locality.wait.node</code>, etc.
+ You should increase this setting if your tasks are long and see poor
locality, but the
+ default usually works well.
</td>
</tr>
<tr>
- <td><code>spark.deploy.defaultCores</code></td>
- <td>(infinite)</td>
+ <td><code>spark.locality.wait.process</code></td>
+ <td>spark.locality.wait</td>
<td>
- Default number of cores to give to applications in Spark's standalone
mode if they don't set
- <code>spark.cores.max</code>. If not set, applications always get all
available cores unless
- they configure <code>spark.cores.max</code> themselves. Set this
lower on a shared cluster to
- prevent users from grabbing the whole cluster by default. <br/>
<b>Note:</b> this setting needs
- to be configured in the standalone cluster master, not in individual
applications; you can set
- it through <code>SPARK_MASTER_OPTS</code> in <code>spark-env.sh</code>.
+ Customize the locality wait for process locality. This affects tasks
that attempt to access
+ cached data in a particular executor process.
</td>
</tr>
<tr>
- <td><code>spark.files.overwrite</code></td>
- <td>false</td>
+ <td><code>spark.locality.wait.node</code></td>
+ <td>spark.locality.wait</td>
<td>
- Whether to overwrite files added through SparkContext.addFile() when
the target file exists and
- its contents do not match those of the source.
+ Customize the locality wait for node locality. For example, you can
set this to 0 to skip
+ node locality and search immediately for rack locality (if your
cluster has rack information).
</td>
</tr>
<tr>
- <td><code>spark.files.fetchTimeout</code></td>
- <td>false</td>
+ <td><code>spark.locality.wait.rack</code></td>
+ <td>spark.locality.wait</td>
<td>
- Communication timeout to use when fetching files added through
SparkContext.addFile() from
- the driver.
+ Customize the locality wait for rack locality.
</td>
</tr>
<tr>
- <td><code>spark.files.userClassPathFirst</code></td>
- <td>false</td>
+ <td><code>spark.scheduler.revive.interval</code></td>
+ <td>1000</td>
<td>
- (Experimental) Whether to give user-added jars precedence over Spark's
own jars when
- loading classes in Executors. This feature can be used to mitigate
conflicts between
- Spark's dependencies and user dependencies. It is currently an
experimental feature.
+ The interval length for the scheduler to revive the worker resource
offers to run tasks.
+ (in milliseconds)
</td>
</tr>
+</table>
+
+#### Security
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.authenticate</code></td>
<td>false</td>
<td>
- Whether spark authenticates its internal connections. See
<code>spark.authenticate.secret</code>
- if not running on Yarn.
+ Whether spark authenticates its internal connections. See
+ <code>spark.authenticate.secret</code> if not running on Yarn.
--- End diff --
Capitalize YARN and Spark throughout this doc
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---