[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhijie Shen updated YARN-321:
-----------------------------
Attachment: HistoryStorageDemo.java
bq. However, during the design, it would be nice to outline (at least at a
high-level) how the "plugins" can work.
Good suggestion. I think there should be a way to make HistoryStorage extensive
to store per framework information. My rough idea is to make HistoryStorage so
general that storing RM basic information is just a special case of doing
storage. To demonstrate the idea, I've uploaded HistoryStorageDemo.java., which
sketches the high-level design.
We can define a schema, which can be extended by users to define the exact
information their applications want to record. There're a bunch of default
schemas, which are used for the information of RMApp, RMAppAttempt, and
RMContainer. The default schemas will be loaded when HistoryStorage is
constructed (or during init() if it's a service), while the customized schemas
can be loaded via configuration or runtime. The methods of adding/reading a
tuple/tuples of any schema are exposed, and the APIs that manipulate the basic
information from RM simply wrap the aforementioned methods.
HistoryStorage owns a map of abstract file, which is the real place to persist
the history information of a specific schema. We can implement different types
of this file, such as InMemoryFile. When a schema is loaded, a file should be
prepared. The file should expose some basic APIs, such as appending a tuple,
reading all tuples, and seeking for a particular tuple.
Any thoughts?
> Generic application history service
> -----------------------------------
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Luke Lu
> Assignee: Vinod Kumar Vavilapalli
> Attachments: HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted
> server in sync with the mapreduce runtime. Every new application would need a
> similar application history server. Having to deploy O(T*V) (where T is
> number of type of application, V is number of version of application) trusted
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and
> history data into a particular directory for later serving. Job history data
> is already stored as json (or binary avro). I propose that we create only one
> trusted application history server, which can have a generic UI (display json
> as a tree of strings) as well. Specific application/version can deploy
> untrusted webapps (a la AMs) to query the application history server and
> interpret the json for its specific UI and/or analytics.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira