[GitHub] spark pull request: Organize configuration docs

mateiz Tue, 27 May 2014 17:05:34 -0700

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13110259
  
    --- Diff: docs/configuration.md ---
    @@ -601,91 +626,59 @@ Apart from these, the following properties are also 
available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.logConf</code></td>
    -  <td>false</td>
    -  <td>
    -    Whether to log the supplied SparkConf as INFO at start of spark 
context.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.eventLog.enabled</code></td>
    -  <td>false</td>
    -  <td>
    -    Whether to log spark events, useful for reconstructing the Web UI 
after the application has
    -    finished.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.eventLog.compress</code></td>
    -  <td>false</td>
    -  <td>
    -    Whether to compress logged events, if 
<code>spark.eventLog.enabled</code> is true.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.eventLog.dir</code></td>
    -  <td>file:///tmp/spark-events</td>
    -  <td>
    -    Base directory in which spark events are logged, if 
<code>spark.eventLog.enabled</code> is true.
    -    Within this base directory, Spark creates a sub-directory for each 
application, and logs the
    -    events specific to the application in this directory.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.deploy.spreadOut</code></td>
    -  <td>true</td>
    +  <td><code>spark.locality.wait</code></td>
    +  <td>3000</td>
       <td>
    -    Whether the standalone cluster manager should spread applications out 
across nodes or try to
    -    consolidate them onto as few nodes as possible. Spreading out is 
usually better for data
    -    locality in HDFS, but consolidating is more efficient for 
compute-intensive workloads. <br/>
    -    <b>Note:</b> this setting needs to be configured in the standalone 
cluster master, not in
    -    individual applications; you can set it through 
<code>SPARK_MASTER_OPTS</code> in
    -    <code>spark-env.sh</code>.
    +    Number of milliseconds to wait to launch a data-local task before 
giving up and launching it
    +    on a less-local node. The same wait will be used to step through 
multiple locality levels
    +    (process-local, node-local, rack-local and then any). It is also 
possible to customize the
    +    waiting time for each level by setting 
<code>spark.locality.wait.node</code>, etc.
    +    You should increase this setting if your tasks are long and see poor 
locality, but the
    +    default usually works well.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.deploy.defaultCores</code></td>
    -  <td>(infinite)</td>
    +  <td><code>spark.locality.wait.process</code></td>
    +  <td>spark.locality.wait</td>
       <td>
    -    Default number of cores to give to applications in Spark's standalone 
mode if they don't set
    -    <code>spark.cores.max</code>. If not set, applications always get all 
available cores unless
    -    they configure <code>spark.cores.max</code> themselves.  Set this 
lower on a shared cluster to
    -    prevent users from grabbing the whole cluster by default. <br/> 
<b>Note:</b> this setting needs
    -    to be configured in the standalone cluster master, not in individual 
applications; you can set
    -    it through <code>SPARK_MASTER_OPTS</code> in <code>spark-env.sh</code>.
    +    Customize the locality wait for process locality. This affects tasks 
that attempt to access
    +    cached data in a particular executor process.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.files.overwrite</code></td>
    -  <td>false</td>
    +  <td><code>spark.locality.wait.node</code></td>
    +  <td>spark.locality.wait</td>
       <td>
    -    Whether to overwrite files added through SparkContext.addFile() when 
the target file exists and
    -    its contents do not match those of the source.
    +    Customize the locality wait for node locality. For example, you can 
set this to 0 to skip
    +    node locality and search immediately for rack locality (if your 
cluster has rack information).
       </td>
     </tr>
     <tr>
    -  <td><code>spark.files.fetchTimeout</code></td>
    -  <td>false</td>
    +  <td><code>spark.locality.wait.rack</code></td>
    +  <td>spark.locality.wait</td>
       <td>
    -    Communication timeout to use when fetching files added through 
SparkContext.addFile() from
    -    the driver.
    +    Customize the locality wait for rack locality.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.files.userClassPathFirst</code></td>
    -  <td>false</td>
    +  <td><code>spark.scheduler.revive.interval</code></td>
    +  <td>1000</td>
       <td>
    -    (Experimental) Whether to give user-added jars precedence over Spark's 
own jars when
    -    loading classes in Executors. This feature can be used to mitigate 
conflicts between
    -    Spark's dependencies and user dependencies. It is currently an 
experimental feature.
    +    The interval length for the scheduler to revive the worker resource 
offers to run tasks.
    +    (in milliseconds)
       </td>
     </tr>
    +</table>
    +
    +#### Security
    +<table class="table">
    +<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
     <tr>
       <td><code>spark.authenticate</code></td>
       <td>false</td>
       <td>
    -    Whether spark authenticates its internal connections. See 
<code>spark.authenticate.secret</code>
    -    if not running on Yarn.
    +    Whether spark authenticates its internal connections. See
    +    <code>spark.authenticate.secret</code> if not running on Yarn.
    --- End diff --
    
    Capitalize YARN and Spark throughout this doc



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Reply via email to