[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhijie Shen updated YARN-321: ----------------------------- Attachment: HistoryStorageDemo.java bq. However, during the design, it would be nice to outline (at least at a high-level) how the "plugins" can work. Good suggestion. I think there should be a way to make HistoryStorage extensive to store per framework information. My rough idea is to make HistoryStorage so general that storing RM basic information is just a special case of doing storage. To demonstrate the idea, I've uploaded HistoryStorageDemo.java., which sketches the high-level design. We can define a schema, which can be extended by users to define the exact information their applications want to record. There're a bunch of default schemas, which are used for the information of RMApp, RMAppAttempt, and RMContainer. The default schemas will be loaded when HistoryStorage is constructed (or during init() if it's a service), while the customized schemas can be loaded via configuration or runtime. The methods of adding/reading a tuple/tuples of any schema are exposed, and the APIs that manipulate the basic information from RM simply wrap the aforementioned methods. HistoryStorage owns a map of abstract file, which is the real place to persist the history information of a specific schema. We can implement different types of this file, such as InMemoryFile. When a schema is loaded, a file should be prepared. The file should expose some basic APIs, such as appending a tuple, reading all tuples, and seeking for a particular tuple. Any thoughts? > Generic application history service > ----------------------------------- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Luke Lu > Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira