[ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3127:
------------------------------------
    Attachment: YARN-3127.20150624-1.patch
                AppTransition.png

Hi [~xgong],
I have modified the patch to work for the scenario you mentioned but in best 
effort basis it will try to avoid duplicated publish, such that events are 
published b4 saving it to statestore (failover happens after publishing and b4 
saving to state store might result in multiple events published). 
Based on state transition diagram, All the events are going through the 
final_saving state except for 
New -> Finished   (on RECOVER event) 
New -> Failed     (on RECOVER event)
New -> Killed     (on KILL,RECOVER event)  
Killing -> Finished  (on ATTEMPT_FINSHED event)
running -> Finished  (on ATTEMPT_FINSHED event)

first 2, No need to handle as the state would be published ATS b4 recovery.
for the 3rd one when Application is killed from New state then we need to 
explicitly publish
and also the last 2 state transitions needs to be handled which doesn't go 
through final_saving state.
Please review...

> Avoid timeline events during RM recovery or restart
> ---------------------------------------------------
>
>                 Key: YARN-3127
>                 URL: https://issues.apache.org/jira/browse/YARN-3127
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, timelineserver
>    Affects Versions: 2.6.0
>         Environment: RM HA with ATS
>            Reporter: Bibin A Chundatt
>            Assignee: Naganarasimha G R
>            Priority: Critical
>         Attachments: AppTransition.png, YARN-3127.20150213-1.patch, 
> YARN-3127.20150329-1.patch, YARN-3127.20150624-1.patch
>
>
> 1.Start RM with HA and ATS configured and run some yarn applications
> 2.Once applications are finished sucessfully start timeline server
> 3.Now failover HA form active to standby
> 4.Access timeline server URL <IP>:<PORT>/applicationhistory
> //Note Earlier exception was thrown when accessed. 
> Incomplete information is shown in the ATS web UI. i.e. attempt container and 
> other information is not displayed.
> Also even if timeline server is started with RM, and on RM restart/ recovery 
> ATS events for the applications already existing in ATS are resent which is 
> not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to