[
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571732#comment-14571732
]
Zhijie Shen commented on YARN-3044:
-----------------------------------
[~Naganarasimha], thanks for updating the patch. It looks good to me so far,
but I want to hold the patch for the following issues.
1. After YARN-3276 is committed, this patch will conflict on {{return
l2.compareTo(l1);}}.
2. We're reworking YARN-1462. It won't affect this patch, but there's commit
revert. Let's wait until YARN-1462 is done.
3. It not caused by this patch, but I found a race condition of publishing app
finish event:
{code}
15/06/03 14:59:56 INFO rmapp.RMAppImpl: application_1433367826630_0002 State
change from FINISHING to FINISHED
15/06/03 14:59:56 INFO capacity.LeafQueue: completedContainer
container=Container: [ContainerId: container_1433367826630_0002_01_000001,
NodeId: localhost:9105, NodeHttpAddress: localhost:8042, Resource:
<memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken,
service: 127.0.0.1:9105 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0,
numApps=1, numContainers=0 cluster=<memory:8192, vCores:8>
15/06/03 14:59:56 INFO resourcemanager.RMAuditLogger: USER=zshen
OPERATION=Application Finished - Succeeded TARGET=RMAppManager
RESULT=SUCCESS APPID=application_1433367826630_0002
15/06/03 14:59:56 INFO capacity.ParentQueue: completedContainer queue=root
usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0>
cluster=<memory:8192, vCores:8>
15/06/03 14:59:56 ERROR metrics.TimelineServiceV2Publisher: Error when
publishing entity TimelineEntity[type='YARN_APPLICATION',
id='application_1433367826630_0002']
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:273)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.publishApplicationFinishedEvent(TimelineServiceV2Publisher.java:133)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:70)
at
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:35)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
at java.lang.Thread.run(Thread.java:745)
15/06/03 14:59:56 INFO amlauncher.AMLauncher: Cleaning master
appattempt_1433367826630_0002_000001
{code}
I think the problem is we stop the timeline collector immediately after calling
appFinished, which is an async call, and publishing operation is executed
asynchronously on another thread. One option is to stopTimelineCollector after
publishing finish event in publisher. Can you take care of it?
{code}
app.rmContext.getSystemMetricsPublisher()
.appFinished(app, finalState, app.finishTime);
app.stopTimelineCollector();
{code}
> [Event producers] Implement RM writing app lifecycle events to ATS
> ------------------------------------------------------------------
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch,
> YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch,
> YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch,
> YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch,
> YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch,
> YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)