[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS

2015-10-07 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3942:

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4233

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch, YARN-3942.002.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS

2015-10-01 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3942:
-
Attachment: YARN-3942.002.patch

Update to the original patch that addresses Jonathan's mkdirs comment and also 
adds the unit test framework for the new store.  The unit tests just test it as 
a black box, but there are no tests yet that explicitly load data into the 
filesystem and verify the store can find the data if it wasn't posted directly 
via postEntities.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch, YARN-3942.002.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3942:

Attachment: YARN-3942-leveldb.002.patch

Thanks [~gss2002]! I modified the patch with your changes. As a quick update, 
the current Leveldb cache storage patch addresses the (potential) OOM issue. 
However, now we inevitably have to load all entities as a whole when refreshing 
the cache, which will introduce longer latency. I discussed this with [~hitesh] 
and [~xgong] and seems like the best solution to the latency problem will be 
reducing the granularity of the caches. Instead of loading (and caching) 
entities for a whole app, we need to load and cache a set of entities, like 
those ones in the same Tez DAG. Let's focus on the current fix in this JIRA, 
and open a new JIRA for the next step? Comments and suggestions are more than 
welcome. 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS

2015-09-21 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3942:

Attachment: YARN-3942-leveldb.001.patch

Thanks [~jlowe] for working on this! On top of the existing patch I built a new 
storage to move the in memory hash map storage to a level db database. The 
original in memory timeline store is not supposed to be used in production 
environments. The price for the new level db hash map storage is latency: it 
generally takes more time to fully load the entities into level db. Had an 
offline discussion with [~xgong] and seems like we need to reduce the 
granularity of caching to improve latency. We may want to address this problem 
in a separate JIRA. 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3942) Timeline store to read events from HDFS

2015-07-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3942:
-
Attachment: YARN-3942.001.patch

This provides a timeline store plugin that allows posting of entities via a 
filesystem (e.g.: HDFS) and limited serving of data from the filesystem.  The 
end result is a system that operates in a similar manner to the MapReduce job 
history server.  Applications can post data under a filesystem directory that 
is periodically scanned by the timeline store plugin.  Queries that appear to 
be for a specific application ID are served from the filesystem data from that 
app, and that data is kept in a configurable cache to amortize the loading cost 
for future queries to the same data.

This has the advantages of decoupling the timeline server from the applications 
so that if the TS is falling behind or completely down it does not affect 
applications that are currently running nor do we drop entities that were 
trying to be posted.  It also reduces the burden on the main timeline server 
database since the majority of the data resides in HDFS rather than the leveldb 
database.

The primary drawback is that the server is unable to answer completely 
arbitrary queries, but it can answer the queries of some primary use-cases we 
care about, like those from the Tez UI.

Posting a prototype patch.  Needs unit tests, but it has undergone some 
end-to-end testing with some Tez jobs that have been updated to emit their ATS 
entites to HDFS.  We are currently in the process of scale-testing the approach.


> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)