[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

squito Mon, 14 Dec 2015 12:55:26 -0800

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6935#discussion_r47556139
  
    --- Diff: docs/monitoring.md ---
    @@ -69,36 +83,53 @@ follows:
       </tr>
     </table>
     
    +### Spark configuration options
    +
     <table class="table">
       <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
       <tr>
         <td>spark.history.provider</td>
    -    <td>org.apache.spark.deploy.history.FsHistoryProvider</td>
    +    <td><code>org.apache.spark.deploy.history.FsHistoryProvider</code></td>
         <td>Name of the class implementing the application history backend. 
Currently there is only
         one implementation, provided by Spark, which looks for application 
logs stored in the
         file system.</td>
       </tr>
       <tr>
    +    <td>spark.history.retainedApplications</td>
    +    <td>50</td>
    +    <td>
    +      The number of application UIs to retain. If this cap is exceeded, 
then the oldest
    +      applications will be removed.
    +    </td>
    +  </tr>
    +  <tr>
         <td>spark.history.fs.logDirectory</td>
         <td>file:/tmp/spark-events</td>
         <td>
    -     Directory that contains application event logs to be loaded by the 
history server
    +    For the filesystem history provider, the URL to the directory 
containing application event
    +    logs to load. This can be a local <code>file://</code> path,
    +    an HDFS path <code>hdfs://namenode/shared/spark-logs</code>
    +    or that of an alternative filesystem supported by the Hadoop APIs.
         </td>
       </tr>
       <tr>
         <td>spark.history.fs.update.interval</td>
         <td>10s</td>
         <td>
    -      The period at which information displayed by this history server is 
updated.
    -      Each update checks for any changes made to the event logs in 
persisted storage.
    +      The period at which the the filesystem history provider checks for 
new or
    +      updated logs in the log directory. A shorter interval detects new 
applications faster,
    +      at the expense of more server load re-reading updated applications.
    +      As soon as an update has completed, listings of the completed and 
incomplete applications
    +      will reflect the changes. For performance reasons, the UIs of web 
applications are
    +      only updated at a slower interval, that defined in 
<code>spark.history.cache.window</code> 
    --- End diff --
    
    Reading this, I'm wondering if maybe it makes sense for 
"spark.history.cache.window" to be defaulted to 
"spark.history.fs.update.interval".  This might be more intuitive to the user 
for out-of-the-box behavior.  And the knob is still there for bigger clusters, 
where someone will need to look through these options more carefully in any 
case.
    
    I'm not entirely convinced myself -- what do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7889] [CORE] HistoryServer to refresh c...

Reply via email to