[ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710384#comment-14710384
 ] 

Li Lu commented on YARN-4061:
-----------------------------

I just realized that if we implement our logger on HDFS, we need some 
mechanisms to identify the fault tolerant writer so that the storage writer can 
find the correct redo-log upon the next start. Currently, we're organizing 
writers within collector managers. Each node will have one collector manager. 
Therefore, we may need to identify the node in the writer. If in future we plan 
to put collectors into special containers, these collectors will also need 
similar mechanism. This problem does not exist in a single server model (like 
ATS v1) since it only has one writer. 

For now, during the process of building this FT writer, I propose to use local 
file system since it can trivially separate the writers under our 
one-node-one-writer model. We can add HDFS support in future, especially when 
we put our timeline writers into containers (by then we definitely need some 
identification mechanisms for the writers). 

> [Fault tolerance] Fault tolerant writer for timeline v2
> -------------------------------------------------------
>
>                 Key: YARN-4061
>                 URL: https://issues.apache.org/jira/browse/YARN-4061
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to