Mit Desai created YARN-2387:
-------------------------------
Summary: Resource Manager crashes with NPE due to lack of
synchronization
Key: YARN-2387
URL: https://issues.apache.org/jira/browse/YARN-2387
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
We recently came across a 0.23 RM crashing with an NPE. Here is the stacktrace
for it.
{noformat}
2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34)
at
org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55)
at java.lang.String.valueOf(String.java:2854)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353)
at java.lang.String.valueOf(String.java:2854)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339)
at java.lang.Thread.run(Thread.java:722)
2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{noformat}
On investigating a on the issue we found that the ContainerStatusPBImpl has
methods that are called by different threads and are not synchronized. Even the
2.X code looks alike.
We need to make these methods synchronized so that we do not encounter this
problem in future.
--
This message was sent by Atlassian JIRA
(v6.2#6252)