[
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795350#comment-13795350
]
Zhijie Shen commented on YARN-975:
----------------------------------
Having thought more about the implementation detail:
1. It seems that the cache mechanism is required immediately. It is a general
case that users will access the information of the application, its attempts
and containers consequently by clicking the links on the web page. If we don't
have the cache mechanism, for every single piece of information, we need to
read the TFile again from HDFS, which results in poor performance.
To cache the complete history data of an application, we've two choices: one is
cache the raw TFile, and the other is cache the all the protobuf objects
recovered from the TFile. I incline to the latter choice, because we can
organize them in a better data structure for quick access.
2. The current APIs allow users to write each piece of the information in the
scope of one application individually. Limited by the current API design, we
need to open a TFile, when it's a first writing operation for a certain
application, and keep it open until the last writing operation is finished.
Then, the problem is how we judge all the information for one application has
been written. One method is to tell the history storage how many attempts and
containers the application has. Another method is to let the caller to
explicitly say closing the TFile. However, these two methods will involve the
interface change, opening more methods.
3. It further raises the question w.r.t the integrity of the history data. In a
normal case, we expect all the application, the attempts and the containers are
written into a TFile. However, for some reason, one piece of information is
missing, and writing operation for it is never done. Then, TFile will always be
open to wait the missing piece.
Probably we need a timeout trigger to close the TFile no matter all the data
comes in or not. However, then, should we persist the TFile into HDFS? The
history data for this application is not complete.
4. However, if we have a timeout trigger for a TFile, RM cannot write the each
piece of the history information at the end of each object's life cycle without
coordination. We will then want the writing operations of all the pieces to be
scheduled together. Then, RM side need more work to coordinate the write
operations (YARN-953).
[~vinodkv], any suggestions?
> Add a file-system implementation for history-storage
> ----------------------------------------------------
>
> Key: YARN-975
> URL: https://issues.apache.org/jira/browse/YARN-975
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Zhijie Shen
> Assignee: Zhijie Shen
> Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch,
> YARN-975.4.patch, YARN-975.5.patch
>
>
> HDFS implementation should be a standard persistence strategy of history
> storage
--
This message was sent by Atlassian JIRA
(v6.1#6144)