[
https://issues.apache.org/jira/browse/YARN-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821292#comment-16821292
]
Hudson commented on YARN-6695:
------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16439 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/16439/])
YARN-6695. Fixed NPE in publishing appFinished events to ATSv2.
(eyang: rev df76cdc8959c51b71704ab5c38335f745a6f35d8)
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServiceV2.md
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TimelineServiceV2Publisher.java
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisherForV2.java
> Race condition in RM for publishing container events vs appFinished events
> causes NPE
> --------------------------------------------------------------------------------------
>
> Key: YARN-6695
> URL: https://issues.apache.org/jira/browse/YARN-6695
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Rohith Sharma K S
> Assignee: Prabhu Joseph
> Priority: Critical
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-6695-002.patch, YARN-6695.001.patch
>
>
> When RM publishes container events i.e by enabling
> *yarn.rm.system-metrics-publisher.emit-container-events*, there is race
> condition for processing events
> vs appFinished event that removes appId from collector list which cause NPE.
> Look at the below trace where appId is removed from collectors first and then
> corresponding events are processed.
> {noformat}
> 2017-06-06 19:28:48,896 INFO capacity.ParentQueue
> (ParentQueue.java:removeApplication(472)) - Application removed - appId:
> application_1496758895643_0005 user: root leaf-queue of parent: root
> #applications: 0
> 2017-06-06 19:28:48,921 INFO collector.TimelineCollectorManager
> (TimelineCollectorManager.java:remove(190)) - The collector service for
> application_1496758895643_0005 was removed
> 2017-06-06 19:28:48,922 ERROR metrics.TimelineServiceV2Publisher
> (TimelineServiceV2Publisher.java:putEntity(451)) - Error when publishing
> entity TimelineEntity[type='YARN_CONTAINER',
> id='container_e01_1496758895643_0005_01_000002']
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:448)
> at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:72)
> at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:480)
> at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:469)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]