[ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800402#comment-13800402
 ] 

Zhijie Shen commented on YARN-925:
----------------------------------

I've some thoughts about the getAllApplications.

* First, we should allow filters for getAllApplications method. Though we can 
retrieve all the applications filter them at the level of the reader interface 
caller, it's not the efficient way. Therefore, IMHO, it's good to push the 
filters back until the level of directly operating the storage, because it has 
the best idea of how the history data is stored and what is the fast filtering 
method. However, I'm afraid it is not a simple step here. It's cheap to add 
some filter parameters in the interface, however, the implementation needs lots 
of work to filter applications *efficiently* (instead of enumerating all 
applications and checking them one-by-one). Probably we need to build the index 
for each filter parameter. For example, in FS implementation, if we want all 
use to search applications according to applicationType, we need to construct a 
applicationType->applicationId lookup table. It will be more difficult if we 
want to allow user to search by  FinishTime, which is the continuous variable.

* Second, no matter what filters we can provide to users, it is always possible 
that users will still get a huge number of applications. It's not just a 
problem of AHS. JHS will also load all the files in the given directory into 
the memory, and RM web services will return all the applications in RMContext 
as well (RM is a bit lucky because default size is 10000). Maybe we should have 
two versions of getAllApplications():
** One is for a small dataset. If users estimate the number of application is 
not big, they can call this method. A collection of all applications will be 
returned. However, we will set an upper bound to limit the size that the 
collection could be.
** One is for a big dataset. If users estimate the number of application is 
big, they can call this method. An iterator will be returned. Users can use the 
iterator to get the application one-by-one, while the server side interactively 
processes the request for next application.

> HistoryStorage Reader Interface for Application History Server
> --------------------------------------------------------------
>
>                 Key: YARN-925
>                 URL: https://issues.apache.org/jira/browse/YARN-925
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: YARN-321
>
>         Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
> YARN-925-4.patch, YARN-925-5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to