[
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800402#comment-13800402
]
Zhijie Shen commented on YARN-925:
----------------------------------
I've some thoughts about the getAllApplications.
* First, we should allow filters for getAllApplications method. Though we can
retrieve all the applications filter them at the level of the reader interface
caller, it's not the efficient way. Therefore, IMHO, it's good to push the
filters back until the level of directly operating the storage, because it has
the best idea of how the history data is stored and what is the fast filtering
method. However, I'm afraid it is not a simple step here. It's cheap to add
some filter parameters in the interface, however, the implementation needs lots
of work to filter applications *efficiently* (instead of enumerating all
applications and checking them one-by-one). Probably we need to build the index
for each filter parameter. For example, in FS implementation, if we want all
use to search applications according to applicationType, we need to construct a
applicationType->applicationId lookup table. It will be more difficult if we
want to allow user to search by FinishTime, which is the continuous variable.
* Second, no matter what filters we can provide to users, it is always possible
that users will still get a huge number of applications. It's not just a
problem of AHS. JHS will also load all the files in the given directory into
the memory, and RM web services will return all the applications in RMContext
as well (RM is a bit lucky because default size is 10000). Maybe we should have
two versions of getAllApplications():
** One is for a small dataset. If users estimate the number of application is
not big, they can call this method. A collection of all applications will be
returned. However, we will set an upper bound to limit the size that the
collection could be.
** One is for a big dataset. If users estimate the number of application is
big, they can call this method. An iterator will be returned. Users can use the
iterator to get the application one-by-one, while the server side interactively
processes the request for next application.
> HistoryStorage Reader Interface for Application History Server
> --------------------------------------------------------------
>
> Key: YARN-925
> URL: https://issues.apache.org/jira/browse/YARN-925
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Mayank Bansal
> Assignee: Mayank Bansal
> Fix For: YARN-321
>
> Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch,
> YARN-925-4.patch, YARN-925-5.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.1#6144)