[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868043#comment-13868043
]
Zhijie Shen commented on YARN-321:
----------------------------------
bq. 1. Does it provide a function to set maximum files and maximum retention
period of AppicationHistory to store in HDFS?
No, currently the FS implementation doesn't discard the historic data of the
applications completed before sometime, answer users' requests based on all the
stored applications. However, via REST API, users are able to filter the
applications outside a start/finish time window.
bq. 2. When there are many AppilicationHistory in HDFS, does it not limit the
number of the reading of ApplicationHistory?
As to REST API, the users are able to limit the number of applications that AHS
should return. As to HDFS access, the current implementation is going to load
all the stored applications and filtering them one-by-one, which is not a
efficient way given a big application collection. YARN-925 is reopened to
discuss pushing the filtering into the implementation of the history store,
where we can prevent loading all the applications. Meanwhile, caching
(YARN-1322) is another way to reduce I/O.
> Generic application history service
> -----------------------------------
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Luke Lu
> Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf,
> Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted
> server in sync with the mapreduce runtime. Every new application would need a
> similar application history server. Having to deploy O(T*V) (where T is
> number of type of application, V is number of version of application) trusted
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and
> history data into a particular directory for later serving. Job history data
> is already stored as json (or binary avro). I propose that we create only one
> trusted application history server, which can have a generic UI (display json
> as a tree of strings) as well. Specific application/version can deploy
> untrusted webapps (a la AMs) to query the application history server and
> interpret the json for its specific UI and/or analytics.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)