[ 
https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093111#comment-14093111
 ] 

Maysam Yabandeh commented on YARN-2405:
---------------------------------------

The problem seems to be that the two separate lists that maintain the list of 
apps are not in sync. The list of apps is taken from 
{code}
Map<ApplicationId, RMApp> rmContext.getRMApps() 
{code}
and then looked up in the second list in AbstractYarnScheduler
{code}
Map<ApplicationId, SchedulerApplication> applications
{code}
via the following code:
{code}
  public FSSchedulerApp getSchedulerApp(ApplicationAttemptId appAttemptId) {
    return (FSSchedulerApp) super.getApplicationAttempt(appAttemptId);
  }

  public T getApplicationAttempt(ApplicationAttemptId applicationAttemptId) {
    SchedulerApplication<T> app =
        applications.get(applicationAttemptId.getApplicationId());
    return app == null ? null : app.getCurrentAppAttempt();
  }
{code}
which returns null if it does not find the app attempt. The 
FairSchedulerAppsBlock does not check for the null returned value, thus NPE.

By code inspection we found one of such cases that it could happen. Not sure if 
it is the same case that we had though. Anyhow, checking for null return values 
by getSchedulerApp seems to be a broader fix that covers that cases that we 
have not discovered yet by code inspection.

One scenario that could potentially result into return null value is the 
following: FairScheduler#addApplication
{code}
    RMApp rmApp = rmContext.getRMApps().get(applicationId);
    FSLeafQueue queue = assignToQueue(rmApp, queueName, user);
    if (queue == null) {
      return;
    }
    // Enforce ACLs
    UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(user);
    if (...) {
      return;
    }
  
    SchedulerApplication application =
        new SchedulerApplication(queue, user);
    applications.put(applicationId, application);
{code}

> NPE in FairSchedulerAppsBlock (scheduler page)
> ----------------------------------------------
>
>                 Key: YARN-2405
>                 URL: https://issues.apache.org/jira/browse/YARN-2405
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Maysam Yabandeh
>
> FairSchedulerAppsBlock#render throws NPE at this line
> {code}
>       int fairShare = fsinfo.getAppFairShare(attemptId);
> {code}
> This causes the scheduler page now showing the app since it lack the 
> definition of appsTableData
> {code}
>  Uncaught ReferenceError: appsTableData is not defined 
> {code}
> The problem is temporary meaning that it is usually resolved by itself either 
> after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to