[ 
https://issues.apache.org/jira/browse/YARN-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8222:
---------------------------
    Description: 
(1) NPE in 
This NPE looks like may happen when node heartbeat delay and try to update 
attempt metrics for a non-exist application.
Reference code of RMContainerImpl$FinishedTransition#updateAttemptMetrics:
{code:java}
private static void updateAttemptMetrics(RMContainerImpl container) {
      Resource resource = container.getContainer().getResource();
      RMAppAttempt rmAttempt = container.rmContext.getRMApps()
          .get(container.getApplicationAttemptId().getApplicationId())
          .getCurrentAppAttempt();

      if (rmAttempt != null) {
         //....
      }
}
{code}
We can add a null check for application before getting attempt form it.

Error log:
{noformat}
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.updateAttemptMetrics(RMContainerImpl.java:742)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:715)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:699)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:482)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:64)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.containerCompleted(FiCaSchedulerApp.java:195)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1793)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:2624)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:663)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1514)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2396)
        at 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205)
        at 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60)
        at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
        at java.lang.Thread.run(Thread.java:834)
{noformat}

  was:
This NPE looks like may happen when node heartbeat delay and try to update 
attempt metrics for a non-exist application.
Reference code of RMContainerImpl$FinishedTransition#updateAttemptMetrics:
{code:java}
private static void updateAttemptMetrics(RMContainerImpl container) {
      Resource resource = container.getContainer().getResource();
      RMAppAttempt rmAttempt = container.rmContext.getRMApps()
          .get(container.getApplicationAttemptId().getApplicationId())
          .getCurrentAppAttempt();

      if (rmAttempt != null) {
         //....
      }
}
{code}
We can add a null check for application before getting attempt form it.

Error log:
{noformat}
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.updateAttemptMetrics(RMContainerImpl.java:742)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:715)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:699)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:482)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:64)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.containerCompleted(FiCaSchedulerApp.java:195)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1793)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:2624)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:663)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1514)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2396)
        at 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205)
        at 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60)
        at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
        at java.lang.Thread.run(Thread.java:834)
{noformat}


> NPE in RMContainerImpl$FinishedTransition#updateAttemptMetrics
> --------------------------------------------------------------
>
>                 Key: YARN-8222
>                 URL: https://issues.apache.org/jira/browse/YARN-8222
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Critical
>
> (1) NPE in 
> This NPE looks like may happen when node heartbeat delay and try to update 
> attempt metrics for a non-exist application.
> Reference code of RMContainerImpl$FinishedTransition#updateAttemptMetrics:
> {code:java}
> private static void updateAttemptMetrics(RMContainerImpl container) {
>       Resource resource = container.getContainer().getResource();
>       RMAppAttempt rmAttempt = container.rmContext.getRMApps()
>           .get(container.getApplicationAttemptId().getApplicationId())
>           .getCurrentAppAttempt();
>       if (rmAttempt != null) {
>          //....
>       }
> }
> {code}
> We can add a null check for application before getting attempt form it.
> Error log:
> {noformat}
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.updateAttemptMetrics(RMContainerImpl.java:742)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:715)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$FinishedTransition.transition(RMContainerImpl.java:699)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:482)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:64)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.containerCompleted(FiCaSchedulerApp.java:195)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1793)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:2624)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:663)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1514)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:2396)
>         at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205)
>         at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60)
>         at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
>         at java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to