[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-321:
-----------------------------

    Attachment: HistoryStorageDemo.java

bq. However, during the design, it would be nice to outline (at least at a 
high-level) how the "plugins" can work.

Good suggestion. I think there should be a way to make HistoryStorage extensive 
to store per framework information. My rough idea is to make HistoryStorage so 
general that storing RM basic information is just a special case of doing 
storage. To demonstrate the idea, I've uploaded HistoryStorageDemo.java., which 
sketches the high-level design.

We can define a schema, which can be extended by users to define the exact 
information their applications want to record. There're a bunch of default 
schemas, which are used for the information of RMApp, RMAppAttempt, and 
RMContainer. The default schemas will be loaded when HistoryStorage is 
constructed (or during init() if it's a service), while the customized schemas 
can be loaded via configuration or runtime. The methods of adding/reading a 
tuple/tuples of any schema are exposed, and the APIs that manipulate the basic 
information from RM simply wrap the aforementioned methods.

HistoryStorage owns a map of abstract file, which is the real place to persist 
the history information of a specific schema. We can implement different types 
of this file, such as InMemoryFile. When a schema is loaded, a file should be 
prepared. The file should expose some basic APIs, such as appending a tuple, 
reading all tuples, and seeking for a particular tuple.

Any thoughts?
                
> Generic application history service
> -----------------------------------
>
>                 Key: YARN-321
>                 URL: https://issues.apache.org/jira/browse/YARN-321
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Luke Lu
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to