Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5423#discussion_r33496039
  
    --- Diff: docs/monitoring.md ---
    @@ -256,6 +256,157 @@ still required, though there is only one application 
available.  Eg. to see the
     running app, you would go to 
`http://localhost:4040/api/v1/applications/[app-id]/jobs`.  This is to
     keep the paths consistent in both modes.
     
    +## Hadoop YARN Timeline service history provider
    +
    +As well as the Filesystem History Provider, Spark can integrate with the 
Hadoop YARN 
    +"Application Timeline Service". This is a service which runs in a YARN 
cluster, recording
    +application- and YARN- published events to a database, retrieving them on 
request.
    +
    +Spark integrates with the timeline service by
    +1. Publishing events to the timeline service as applications execute.
    +2. Listing application histories published to the timeline service.
    +3. Retrieving the details of specific application histories.
    +
    +### Configuring the Timeline Service
    +
    +For details on configuring and starting the timeline service, consult the 
Hadoop documentation.
    +
    +From the perspective of Spark, the key requirements are
    +1. The YARN timeline service must be running.
    +1. Its URL is known, and configured in the `yarn-site.xml` configuration 
file.
    +1. The user has an Kerberos credentials required to interact with the 
service.
    +
    +The timeline service URL must be declared in the property 
`yarn.timeline-service.webapp.address`,
    +or, if HTTPS is the protocol, `yarn.timeline-service.webapp.https.address`
    +
    +The choice between HTTP and HTTPS is made on the value of 
`yarn.http.policy`, with can be one of 
    +`http-only` (default), `https_only` or `http_and_https`; HTTP will be used 
unless the policy
    +is `https_only`.
    +
    +Examples:
    +
    +    <!-- Binding for HTTP endpoint -->
    +    <property>
    +      <name>yarn.timeline-service.webapp.address</name>
    +      <value>atshost.example.org:8188</value>
    +    </property>
    +    
    +    <property>
    +      <name>yarn.timeline-service.enabled</name>
    +      <value>true</value>
    +    </property>
    +
    +
    +The root web page of the timeline service can be verified with a web 
browser,
    +as an easy check that the service is live. For the HTTP 
    +
    +### Saving Application History to the YARN Timeline Service
    +
    +To publish to the YARN Timeline Service, Spark applications executed in a 
YARN cluster
    +must be configured to instantiate the `YarnHistoryService`. This is done
    +by setting the spark configuration property `spark.yarn.services`
    +to `org.apache.spark.deploy.history.yarn.YarnHistoryService`
    +
    +    spark.yarn.services 
org.apache.spark.deploy.history.yarn.YarnHistoryService
    +
    +Notes
    +
    +1. If the class-name is mis-spelled or cannot be instantiated, an error 
message will
    +be logged; the application will still run.
    +2. YARN history publishing can run alongside the filesystem history 
listener; both
    +histories can be viewed by an appropriately configured history service.
    +3. If the timeline service is disabled, that is 
`yarn.timeline-service.enabled` is not 
    +`true`, then the history will not be published: the application will still 
run.
    +4. Similarly, in a cluster where the timeline service is disabled, the 
history server
    +will simply show an empty history, while warning that the history service 
is disabled.
    +5. In a secure cluster, the user must have the Kerberos credentials to 
interact
    +with the timeline server. Being logged in via `kinit` or a keytab should 
suffice.
    +6. If the application is killed it will be listed as incompleted. In an 
application
    +started as a `--master yarn-client` this happens if the client process is 
stopped
    +with a `kill -9` or process failure).
    +Similarly, an application started with `--master yarn-cluster` will remain 
incompleted
    +if killed without warning, if it fails, or it is killed via the `yarn 
kill` command.
    +
    +
    +Specific configuration options:
    +
    +<table class="table">
    +  <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
    +  <tr>
    +    <td><code>spark.hadoop.yarn.timeline.batchSize</code></td>
    +    <td>3</td>
    +    <td>
    +    How many events to batch up before submitting them to the timeline 
service.
    +    This is a performance optimization. 
    +    </td>
    +  </tr>
    +</table>
    +
    +
    +### Viewing Application Histories via the YARN Timeline Service
    +
    +To retrieve and display history information in the YARN Timeline Service, 
the Spark history server must
    +be configured to query the timeline service for the lists of running and 
completed applications.
    +
    +Note that the history server does not actually to be deployed within the 
Hadoop cluster itself —it
    --- End diff --
    
    r/ does not actually need to be deployed/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to