[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3942: Issue Type: Sub-task (was: Improvement) Parent: YARN-4233 > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942-leveldb.001.patch, > YARN-3942-leveldb.002.patch, YARN-3942.001.patch, YARN-3942.002.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3942: - Attachment: YARN-3942.002.patch Update to the original patch that addresses Jonathan's mkdirs comment and also adds the unit test framework for the new store. The unit tests just test it as a black box, but there are no tests yet that explicitly load data into the filesystem and verify the store can find the data if it wasn't posted directly via postEntities. > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942-leveldb.001.patch, > YARN-3942-leveldb.002.patch, YARN-3942.001.patch, YARN-3942.002.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3942: Attachment: YARN-3942-leveldb.002.patch Thanks [~gss2002]! I modified the patch with your changes. As a quick update, the current Leveldb cache storage patch addresses the (potential) OOM issue. However, now we inevitably have to load all entities as a whole when refreshing the cache, which will introduce longer latency. I discussed this with [~hitesh] and [~xgong] and seems like the best solution to the latency problem will be reducing the granularity of the caches. Instead of loading (and caching) entities for a whole app, we need to load and cache a set of entities, like those ones in the same Tez DAG. Let's focus on the current fix in this JIRA, and open a new JIRA for the next step? Comments and suggestions are more than welcome. > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942-leveldb.001.patch, > YARN-3942-leveldb.002.patch, YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3942: Attachment: YARN-3942-leveldb.001.patch Thanks [~jlowe] for working on this! On top of the existing patch I built a new storage to move the in memory hash map storage to a level db database. The original in memory timeline store is not supposed to be used in production environments. The price for the new level db hash map storage is latency: it generally takes more time to fully load the entities into level db. Had an offline discussion with [~xgong] and seems like we need to reduce the granularity of caching to improve latency. We may want to address this problem in a separate JIRA. > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942-leveldb.001.patch, YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3942: - Attachment: YARN-3942.001.patch This provides a timeline store plugin that allows posting of entities via a filesystem (e.g.: HDFS) and limited serving of data from the filesystem. The end result is a system that operates in a similar manner to the MapReduce job history server. Applications can post data under a filesystem directory that is periodically scanned by the timeline store plugin. Queries that appear to be for a specific application ID are served from the filesystem data from that app, and that data is kept in a configurable cache to amortize the loading cost for future queries to the same data. This has the advantages of decoupling the timeline server from the applications so that if the TS is falling behind or completely down it does not affect applications that are currently running nor do we drop entities that were trying to be posted. It also reduces the burden on the main timeline server database since the majority of the data resides in HDFS rather than the leveldb database. The primary drawback is that the server is unable to answer completely arbitrary queries, but it can answer the queries of some primary use-cases we care about, like those from the Tez UI. Posting a prototype patch. Needs unit tests, but it has undergone some end-to-end testing with some Tez jobs that have been updated to emit their ATS entites to HDFS. We are currently in the process of scale-testing the approach. > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)