[ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795350#comment-13795350
 ] 

Zhijie Shen commented on YARN-975:
----------------------------------

Having thought more about the implementation detail:

1. It seems that the cache mechanism is required immediately. It is a general 
case that users will access the information of the application, its attempts 
and containers consequently by clicking the links on the web page. If we don't 
have the cache mechanism, for every single piece of information, we need to 
read the TFile again from HDFS, which results in poor performance.

To cache the complete history data of an application, we've two choices: one is 
cache the raw TFile, and the other is cache the all the protobuf objects 
recovered from the TFile. I incline to the latter choice, because we can 
organize them in a better data structure for quick access.

2.  The current APIs allow users to write each piece of the information in the 
scope of one application individually. Limited by the current API design, we 
need to open a TFile, when it's a first writing operation for a certain 
application, and keep it open until the last writing operation is finished.

Then, the problem is how we judge all the information for one application has 
been written. One method is to tell the history storage how many attempts and 
containers the application has. Another method is to let the caller to 
explicitly say closing the TFile. However, these two methods will involve the 
interface change, opening more methods.

3. It further raises the question w.r.t the integrity of the history data. In a 
normal case, we expect all the application, the attempts and the containers are 
written into a TFile. However, for some reason, one piece of information is 
missing, and writing operation for it is never done. Then, TFile will always be 
open to wait the missing piece.

Probably we need a timeout trigger to close the TFile no matter all the data 
comes in or not. However, then, should we persist the TFile into HDFS? The 
history data for this application is not complete.

4. However, if we have a timeout trigger for a TFile, RM cannot write the each 
piece of the history information at the end of each object's life cycle without 
coordination. We will then want the writing operations of all the pieces to be 
scheduled together. Then, RM side need more work to coordinate the write 
operations (YARN-953).

[~vinodkv], any suggestions? 

> Add a file-system implementation for history-storage
> ----------------------------------------------------
>
>                 Key: YARN-975
>                 URL: https://issues.apache.org/jira/browse/YARN-975
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, 
> YARN-975.4.patch, YARN-975.5.patch
>
>
> HDFS implementation should be a standard persistence strategy of history 
> storage



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to