[
https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038294#comment-15038294
]
Jason Lowe commented on YARN-4392:
----------------------------------
+1 for the latest patch, if we go with re-sending of events upon recovery.
I think re-sending of events is "safer" assuming the redundant events are
handled properly. That way if we missed an event we will fill that gap upon
recovery. There is the concern of extra load it generates on the RM and ATS
during recovery. Note that we probably will miss ATS events upon recovery in
some scenarios if we don't re-send since ATS event posting is async and state
store updating are async. There's a race where we could update the state store
and crash before the ATS event is sent.
> ApplicationCreatedEvent event time resets after RM restart/failover
> -------------------------------------------------------------------
>
> Key: YARN-4392
> URL: https://issues.apache.org/jira/browse/YARN-4392
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.8.0
> Reporter: Xuan Gong
> Assignee: Naganarasimha G R
> Priority: Critical
> Attachments: YARN-4392-2015-11-24.patch, YARN-4392.1.patch,
> YARN-4392.2.patch
>
>
> {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) -
> Finished time 1437453994768 is ahead of started time 1440308399674
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437454008244 is ahead of started time 1440308399676
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444305171 is ahead of started time 1440308399653
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444293115 is ahead of started time 1440308399647
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444379645 is ahead of started time 1440308399656
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444361234 is ahead of started time 1440308399655
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444342029 is ahead of started time 1440308399654
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444323447 is ahead of started time 1440308399654
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444430006 is ahead of started time 1440308399660
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444415698 is ahead of started time 1440308399659
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444419060 is ahead of started time 1440308399658
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished
> time 1437444393931 is ahead of started time 1440308399657
> {code} .
> From ATS logs, we would see a large amount of 'stale alerts' messages
> periodically
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)