Mit Desai created YARN-2387:
-------------------------------

             Summary: Resource Manager crashes with NPE due to lack of 
synchronization
                 Key: YARN-2387
                 URL: https://issues.apache.org/jira/browse/YARN-2387
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 3.0.0, 2.5.0
            Reporter: Mit Desai
            Assignee: Mit Desai


We recently came across a 0.23 RM crashing with an NPE. Here is the stacktrace 
for it.

{noformat}
2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61)
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68)
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53)
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34)
        at
org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55)
        at java.lang.String.valueOf(String.java:2854)
        at java.lang.StringBuilder.append(StringBuilder.java:128)
        at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353)
        at java.lang.String.valueOf(String.java:2854)
        at java.lang.StringBuilder.append(StringBuilder.java:128)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688)
        at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82)
        at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339)
        at java.lang.Thread.run(Thread.java:722)
2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{noformat}

On investigating a on the issue we found that the ContainerStatusPBImpl has 
methods that are called by different threads and are not synchronized. Even the 
2.X code looks alike.

We need to make these methods synchronized so that we do not encounter this 
problem in future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to