Jason Lowe updated YARN-3942:
    Attachment: YARN-3942.001.patch

This provides a timeline store plugin that allows posting of entities via a 
filesystem (e.g.: HDFS) and limited serving of data from the filesystem.  The 
end result is a system that operates in a similar manner to the MapReduce job 
history server.  Applications can post data under a filesystem directory that 
is periodically scanned by the timeline store plugin.  Queries that appear to 
be for a specific application ID are served from the filesystem data from that 
app, and that data is kept in a configurable cache to amortize the loading cost 
for future queries to the same data.

This has the advantages of decoupling the timeline server from the applications 
so that if the TS is falling behind or completely down it does not affect 
applications that are currently running nor do we drop entities that were 
trying to be posted.  It also reduces the burden on the main timeline server 
database since the majority of the data resides in HDFS rather than the leveldb 

The primary drawback is that the server is unable to answer completely 
arbitrary queries, but it can answer the queries of some primary use-cases we 
care about, like those from the Tez UI.

Posting a prototype patch.  Needs unit tests, but it has undergone some 
end-to-end testing with some Tez jobs that have been updated to emit their ATS 
entites to HDFS.  We are currently in the process of scale-testing the approach.

> Timeline store to read events from HDFS
> ---------------------------------------
>                 Key: YARN-3942
>                 URL: https://issues.apache.org/jira/browse/YARN-3942
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: timelineserver
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-3942.001.patch
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.

This message was sent by Atlassian JIRA

Reply via email to