[jira] [Commented] (YARN-7272) Enable timeline collector fault tolerance

Jason Lowe (JIRA) Mon, 09 Oct 2017 06:30:23 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196949#comment-16196949
 ]


Jason Lowe commented on YARN-7272:
----------------------------------

I'm not proposing we use leveldb for persisting the entities long-term, rather 
only for the duration between receipt from the client and up to the point the 
ATSv2 backend acknowledges receipt.  At that point the entries would be deleted 
from leveldb.  A routine, background compaction would prevent the database from 
growing to a point where recovery performance would be a concern.

The NM state store already does this today, deleting container, resource, and 
application entries when we no longer need to recover them.  Is there a 
specific concern about using leveldb to implement the WAL for transient 
persistence?  I just want to make sure we're not going to invent yet another 
WAL solution here as there are many to choose from already.

> Enable timeline collector fault tolerance
> -----------------------------------------
>
>                 Key: YARN-7272
>                 URL: https://issues.apache.org/jira/browse/YARN-7272
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineclient, timelinereader, timelineserver
>            Reporter: Vrushali C
>            Assignee: Rohith Sharma K S
>
> If a NM goes down and along with it the timeline collector aux service for a 
> running yarn app, we would like that yarn app to re-establish connection with 
> a new timeline collector. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-7272) Enable timeline collector fault tolerance

Reply via email to