Li Lu commented on YARN-4061:

Thanks for the review [~sjlee0]! 

bq. Since the actual storage writer (HBase) always acts on this queue 
asynchronously, it seems that the client cannot have a synchronous write 
semantics. Is that a correct reading? If so, how would we implement such a 
synchronous write?

This is definitely a valid concern. Yes having a pure synchronous semantic with 
this design is hard. To support synchronous semantic we generally have two ways:
- We not only need to enforce a flush, but on synchronous calls also need to 
block until the the data is actually persisted onto HBase. The advantage of 
this design is simplicity, but if the HBase storage is not available we cannot 
perform any synchronous calls. This makes the "fault tolerant" feature less 
- Since we know (and trust) that data on HDFS will be eventually available in 
HBase, maybe we can have a FT reader to check HDFS on or before we check the 
HBase? In this way we can always select out the most update data, either in 
HDFS or in HBase. The shortcoming of this approach is that local file storage 
will not work here, because those buffered data is not generally available to 
other nodes (and I doubt if this strong consistency model is too ambitious 
given the amount of data). 

About throughput, I agree we need to be careful here. We may have some traffic 
with similar scale and flow as the MapReduce JobHistory server? If this is the 
case, I think we can definitely start with some ideas in the JHS? 

> [Fault tolerance] Fault tolerant writer for timeline v2
> -------------------------------------------------------
>                 Key: YARN-4061
>                 URL: https://issues.apache.org/jira/browse/YARN-4061
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: FaulttolerantwriterforTimelinev2.pdf
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 

This message was sent by Atlassian JIRA

Reply via email to