Jason Lowe commented on YARN-3143:

bq. Could you provide the RM logs, please ? That will help debug.

Here's what the RM log says when the NPE occurs with a finalStatus query:

2015-02-05 15:18:09,744 [1124535424@qtp-165859665-85345] WARN 
webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR

That's all there is.  No stacktrace or anything else.  Nothing else in the logs 
looks out of place around the time of the call.  We also saw nothing of note in 
the logs when the web services returned apps with missing fields, which aligns 
with what I'm pretty confident is happening.  The RM is removing applications 
from the RMApps map just as the web services are trying to walk it.  Given how 
expensive it is to grab all the scheduler lock for all those applications on 
this busy cluster, I'm not surprised that by the time the web services receives 
the full list of application reports at least one of the apps has retired from 
the RMApps collection.

> RM Apps REST API can return NPE or entries missing id and other fields
> ----------------------------------------------------------------------
>                 Key: YARN-3143
>                 URL: https://issues.apache.org/jira/browse/YARN-3143
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: webapp
>    Affects Versions: 2.5.2
>            Reporter: Kendall Thrapp
>            Assignee: Jason Lowe
>         Attachments: YARN-3143.001.patch
> I'm seeing intermittent null pointer exceptions being returned by
> the YARN Apps REST API.
> For example:
> {code}
> http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED
> {code}
> JSON Response was:
> {code}
> {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}}
> {code}
> At a glance appears to be only when we query for unfinished apps (i.e. 
> finalStatus=UNDEFINED).  
> Possibly related, when I do get back a list of apps, sometimes one or more of 
> the apps will be missing most of the fields, like id, name, user, etc., and 
> the fields that are present all have zero for the value.  
> For example:
> {code}
> {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0}
> {code}
> Let me know if there's any other information I can provide to help debug.

This message was sent by Atlassian JIRA

Reply via email to