Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/204#discussion_r11459038
--- Diff: docs/monitoring.md ---
@@ -12,17 +12,71 @@ displays useful information about the application. This
includes:
* A list of scheduler stages and tasks
* A summary of RDD sizes and memory usage
-* Information about the running executors
* Environmental information.
+* Information about the running executors
You can access this interface by simply opening
`http://<driver-node>:4040` in a web browser.
-If multiple SparkContexts are running on the same host, they will bind to
succesive ports
+If multiple SparkContexts are running on the same host, they will bind to
successive ports
beginning with 4040 (4041, 4042, etc).
-Spark's Standalone Mode cluster manager also has its own
-[web UI](spark-standalone.html#monitoring-and-logging).
+Note that this information is only available for the duration of the
application by default.
+To view the web UI after the fact, set `spark.eventLog.enabled` to true
before starting the
+application. This configures Spark to log Spark events that encode the
information displayed
+in the UI to persisted storage.
-Note that in both of these UIs, the tables are sortable by clicking their
headers,
+## Viewing After the Fact
+
+Spark's Standalone Mode cluster manager also has its own
+[web UI](spark-standalone.html#monitoring-and-logging). If an application
has logged events over
+the course of its lifetime, then the Standalone master's web UI will
automatically re-render the
+application's UI after the application has finished.
+
+If Spark is run on Mesos or YARN, it is still possible to reconstruct the
UI of a finished
+application through Spark's history server, provided that the
application's event logs exist.
+You can start a the history server by executing:
+
+ ./sbin/start-history-server.sh <base-logging-directory>
+
+The base logging directory must be supplied, and should contain
sub-directories that each
+represents an application's event logs. This creates a web interface at
+`http://<server-url>:18080` by default, but the port can be changed by
supplying an extra
+parameter to the start script. The history server depends on the following
variables:
+
+<table class="table">
+ <tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr>
+ <tr>
+ <td><code>SPARK_DAEMON_MEMORY</code></td>
+ <td>Memory to allocate to the history server. (default: 512m).</td>
+ </tr>
+ <tr>
+ <td><code>SPARK_DAEMON_JAVA_OPTS</code></td>
+ <td>JVM options for the history server (default: none).</td>
+ </tr>
+</table>
+
+Further, the history server can be configured as follows:
+
+<table class="table">
+ <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+ <tr>
+ <td>spark.history.updateInterval</td>
+ <td>10</td>
+ <td>
+ The period at which information displayed by this history server is
updated. Each update
--- End diff --
I'd say "The period, in seconds, at which..."
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---