[ 
https://issues.apache.org/jira/browse/YARN-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236796#comment-16236796
 ] 

Vrushali C commented on YARN-7272:
----------------------------------

Sharing some thoughts:
Collector fault tolerance helps deal with two things:
- when collector itself goes down
- when the data that is in the memory of the buffered mutator which has NOT yet 
been flushed to hbase is lost.

Fault tolerance solution should have the ability to be turned on/ off. And 
should be off by default.

It should be a cluster wide default as well as allowed as a client specific 
setting as well. For example, some super critical application might be 
requiring zero tolerance for timeline data loss, in which case, it can be 
turned on for this specific app. For some other app, slightly different tuning 
may be preferable. And for all other apps, writing to offline storage should 
have the ability to be turned off. 



> Enable timeline collector fault tolerance
> -----------------------------------------
>
>                 Key: YARN-7272
>                 URL: https://issues.apache.org/jira/browse/YARN-7272
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineclient, timelinereader, timelineserver
>            Reporter: Vrushali C
>            Assignee: Rohith Sharma K S
>            Priority: Major
>         Attachments: YARN-7272-wip.patch
>
>
> If a NM goes down and along with it the timeline collector aux service for a 
> running yarn app, we would like that yarn app to re-establish connection with 
> a new timeline collector. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to