[ 
https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257561#comment-14257561
 ] 

Jian He commented on YARN-2983:
-------------------------------

thanks for taking over YARN-2987 too ! Given the change is fairly small and 
they are touching the same piece of code, do you mind incorporating this change 
as part of that and close this ?

> NPE possible in ClientRMService#getQueueInfo
> --------------------------------------------
>
>                 Key: YARN-2983
>                 URL: https://issues.apache.org/jira/browse/YARN-2983
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.1
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>         Attachments: YARN-2983.patch
>
>
> While going through code for checking YARN-2978 , found one issue. 
> During construction of {{GetQueueInfoResponse}} in 
> {{ClientRMService#getQueueInfo}}, we first collect application attempts from 
> scheduler and then get apps from a {{ConcurrentHashMap}} in {{RMContext}}. 
> Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap is 
> thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a 
> for loop) are not. 
> For instance, in code below, we are calling rmContext.getRMApps()#get in a 
> loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. 
> But there is no null check inside this for loop before dereferencing the 
> value returned i.e. rmApp. Although all the applicationattempts  have been 
> fetched for the queue just above the for loop, but as this block of code is 
> not synchronized, there is a possibility that another thread may delete RMApp 
> from the ConcurrentHashMap at the same time. This can happen when an app 
> finishes/completes and number of completed apps exceed the config  
> {{yarn.resourcemanager.max-completed-applications}}.
> I think there should be a null check inside this for loop, otherwise a NPE 
> can occur.
> {code:title=ClientRMService#getQueueInfo}
> public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
>       throws YarnException {
>   .....
>   if (request.getIncludeApplications()) {
>     List<ApplicationAttemptId> apps =
>     scheduler.getAppsInQueue(request.getQueueName());
>     appReports = new ArrayList<ApplicationReport>(apps.size());
>     for (ApplicationAttemptId app : apps) {
>       RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
>       appReports.add(rmApp.createAndGetApplicationReport(null, true));
>     }
>   }
>   ......
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to