Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/6935#discussion_r48368082
--- Diff: docs/monitoring.md ---
@@ -69,36 +83,53 @@ follows:
</tr>
</table>
+### Spark configuration options
+
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td>spark.history.provider</td>
- <td>org.apache.spark.deploy.history.FsHistoryProvider</td>
+ <td><code>org.apache.spark.deploy.history.FsHistoryProvider</code></td>
<td>Name of the class implementing the application history backend.
Currently there is only
one implementation, provided by Spark, which looks for application
logs stored in the
file system.</td>
</tr>
<tr>
+ <td>spark.history.retainedApplications</td>
+ <td>50</td>
+ <td>
+ The number of application UIs to retain. If this cap is exceeded,
then the oldest
+ applications will be removed.
+ </td>
+ </tr>
+ <tr>
<td>spark.history.fs.logDirectory</td>
<td>file:/tmp/spark-events</td>
<td>
- Directory that contains application event logs to be loaded by the
history server
+ For the filesystem history provider, the URL to the directory
containing application event
+ logs to load. This can be a local <code>file://</code> path,
+ an HDFS path <code>hdfs://namenode/shared/spark-logs</code>
+ or that of an alternative filesystem supported by the Hadoop APIs.
</td>
</tr>
<tr>
<td>spark.history.fs.update.interval</td>
<td>10s</td>
<td>
- The period at which information displayed by this history server is
updated.
- Each update checks for any changes made to the event logs in
persisted storage.
+ The period at which the the filesystem history provider checks for
new or
+ updated logs in the log directory. A shorter interval detects new
applications faster,
+ at the expense of more server load re-reading updated applications.
+ As soon as an update has completed, listings of the completed and
incomplete applications
+ will reflect the changes. For performance reasons, the UIs of web
applications are
+ only updated at a slower interval, that defined in
<code>spark.history.cache.window</code>
--- End diff --
There's three costs in the system: listing cost, probe cost and replay
costs.
* listing cost is pretty expensive in the history server, as it replays the
entire history just to get a few flags which could be cached alongside
(completed flag, etc). That's why it can be slow to startup. After startup the
async replay is only done on changed data. Load on HDFS: negligible.
* probe cost: simply checking the internal state of things updated in the
update thread., ~0
* replay cost: expensive, O(events), so essentially O(filesize). Again,
HDFS doesn't notice.
The rationale for having a probe interval is not so much code cost, but
replay costs: have a 15s probe interval would mean "a user clicking through the
UI of a busy app could trigger a reload every 15s". I don't have the stats to
decide good or bad that is, but a longer interval worries me less.
FWIW, the Yarn timeline provider costs are
-listing cost is less expensive than for the FS history provider, but it
does move some of the load into the timeline server (search of database,
serialization of result).
-probe cost. ~0 again
-replay cost., same replay costs as for the FS, but now with json
serialization and transmission over HTTP to add.
I suspect there you'd want a longer interval for probes, just to keep those
replays down.
Again: more data is needed here. I've added the metrics to the cache as a
start to that âadd metrics publishing to the history server and this code is
ready to be hooked up and so show the numbers on cache reload operations
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]