[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

Li Lu (JIRA) Wed, 03 Aug 2016 18:06:51 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406938#comment-15406938
 ]


Li Lu commented on YARN-4061:
-----------------------------

Let me revive this thread after the branch merge. From some offline discussion 
I think our plan is to implement a specialized BufferedMutator so that it can 
retry when the cluster is down. The benefit of this approach is we do not need 
to repost the data to buffered mutator, so that saves much memory operations 
when the cluster is down. We can pretty much reuse some retry logic in our 
codebase today. 

The challenge for this design is that we're not persisting anything until the 
data reaches the HBase cluster. That is to say, with this change we can handle 
the case when the HBase cluster is down, but cannot handle if collectors 
themselves are down. If the collector fails when it's retrying, we lose the 
data. To address this problem, we may use a local journal file to store the 
state in the buffered mutator. 

Aggregation status is something we need to recover if collectors fail. However, 
at the very first phase maybe we can restart everything in the aggregation 
table on restarts? 

I know this thread is an old one, but please feel free to chime in since we're 
targeting to add this feature to the Alpha 2 phase of timeline v2. 

> [Fault tolerance] Fault tolerant writer for timeline v2
> -------------------------------------------------------
>
>                 Key: YARN-4061
>                 URL: https://issues.apache.org/jira/browse/YARN-4061
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>              Labels: YARN-5355
>         Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

Reply via email to