[
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944067#comment-14944067
]
Li Lu commented on YARN-4061:
-----------------------------
Thanks for the review [~sjlee0]!
bq. Since the actual storage writer (HBase) always acts on this queue
asynchronously, it seems that the client cannot have a synchronous write
semantics. Is that a correct reading? If so, how would we implement such a
synchronous write?
This is definitely a valid concern. Yes having a pure synchronous semantic with
this design is hard. To support synchronous semantic we generally have two ways:
- We not only need to enforce a flush, but on synchronous calls also need to
block until the the data is actually persisted onto HBase. The advantage of
this design is simplicity, but if the HBase storage is not available we cannot
perform any synchronous calls. This makes the "fault tolerant" feature less
appealing.
- Since we know (and trust) that data on HDFS will be eventually available in
HBase, maybe we can have a FT reader to check HDFS on or before we check the
HBase? In this way we can always select out the most update data, either in
HDFS or in HBase. The shortcoming of this approach is that local file storage
will not work here, because those buffered data is not generally available to
other nodes (and I doubt if this strong consistency model is too ambitious
given the amount of data).
About throughput, I agree we need to be careful here. We may have some traffic
with similar scale and flow as the MapReduce JobHistory server? If this is the
case, I think we can definitely start with some ideas in the JHS?
> [Fault tolerance] Fault tolerant writer for timeline v2
> -------------------------------------------------------
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Li Lu
> Assignee: Li Lu
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage
> down time and timeline collector failures.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)