[
https://issues.apache.org/jira/browse/YARN-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196949#comment-16196949
]
Jason Lowe commented on YARN-7272:
----------------------------------
I'm not proposing we use leveldb for persisting the entities long-term, rather
only for the duration between receipt from the client and up to the point the
ATSv2 backend acknowledges receipt. At that point the entries would be deleted
from leveldb. A routine, background compaction would prevent the database from
growing to a point where recovery performance would be a concern.
The NM state store already does this today, deleting container, resource, and
application entries when we no longer need to recover them. Is there a
specific concern about using leveldb to implement the WAL for transient
persistence? I just want to make sure we're not going to invent yet another
WAL solution here as there are many to choose from already.
> Enable timeline collector fault tolerance
> -----------------------------------------
>
> Key: YARN-7272
> URL: https://issues.apache.org/jira/browse/YARN-7272
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineclient, timelinereader, timelineserver
> Reporter: Vrushali C
> Assignee: Rohith Sharma K S
>
> If a NM goes down and along with it the timeline collector aux service for a
> running yarn app, we would like that yarn app to re-establish connection with
> a new timeline collector.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]