[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706645#comment-13706645
]
Hitesh Shah commented on YARN-321:
----------------------------------
{quote}
To start with, we will have an implementation with per-app HDFS file.
{quote}
[~vinodkv] Based on the above, it seems like this will address allowing someone
to analyse only one job at a time. Based on a per-app file, it will be
non-trivial to search for applications that match a certain criteria? All jobs
that run on a certain day? All jobs of a certain type? All jobs that took
longer than 10 mins to run? All jobs that use over 100 containers? Sure, a
directory hierarchy based on dates may solve the very basic use-cases but it
looks like anyone needing to do any slightly more complex analysis on cluster
utilization will need to build an indexing layer on top of the file-based store?
> Generic application history service
> -----------------------------------
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Luke Lu
> Assignee: Vinod Kumar Vavilapalli
>
> The mapreduce job history server currently needs to be deployed as a trusted
> server in sync with the mapreduce runtime. Every new application would need a
> similar application history server. Having to deploy O(T*V) (where T is
> number of type of application, V is number of version of application) trusted
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and
> history data into a particular directory for later serving. Job history data
> is already stored as json (or binary avro). I propose that we create only one
> trusted application history server, which can have a generic UI (display json
> as a tree of strings) as well. Specific application/version can deploy
> untrusted webapps (a la AMs) to query the application history server and
> interpret the json for its specific UI and/or analytics.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira