[
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710384#comment-14710384
]
Li Lu commented on YARN-4061:
-----------------------------
I just realized that if we implement our logger on HDFS, we need some
mechanisms to identify the fault tolerant writer so that the storage writer can
find the correct redo-log upon the next start. Currently, we're organizing
writers within collector managers. Each node will have one collector manager.
Therefore, we may need to identify the node in the writer. If in future we plan
to put collectors into special containers, these collectors will also need
similar mechanism. This problem does not exist in a single server model (like
ATS v1) since it only has one writer.
For now, during the process of building this FT writer, I propose to use local
file system since it can trivially separate the writers under our
one-node-one-writer model. We can add HDFS support in future, especially when
we put our timeline writers into containers (by then we definitely need some
identification mechanisms for the writers).
> [Fault tolerance] Fault tolerant writer for timeline v2
> -------------------------------------------------------
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Li Lu
> Assignee: Li Lu
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage
> down time and timeline collector failures.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)