Rohith commented on YARN-3410:

bq. what's the use case of using rmadmin removing a state while RM is running?
Practically rmadmin need not to remove rm state store while RM running. I was 
thinking like if any exception happens during recovery like YARN-2340, then RM 
never get exited. RM keeps on switcing to standby and trying to become active. 
In this case, admin can format state store without stopping RM.

bq. it's better that RM can log all errors of applications recovering before 
die. With this, admin can know which application states caused RM die.
I think this will be hard to get which application caused the problem ICO 
RuntimeExceptions. Admin need to back track the exception in the logs to 
identify it.

> YARN admin should be able to remove individual application records from 
> RMStateStore
> ------------------------------------------------------------------------------------
>                 Key: YARN-3410
>                 URL: https://issues.apache.org/jira/browse/YARN-3410
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager, yarn
>            Reporter: Wangda Tan
>            Assignee: Rohith
>            Priority: Critical
>         Attachments: 0001-YARN-3410-v1.patch
> When RM state store entered an unexpected state, one example is YARN-2340, 
> when an attempt is not in final state but app already completed, RM can never 
> get up unless format RMStateStore.
> I think we should support remove individual application records from 
> RMStateStore to unblock RM admin make choice of either waiting for a fix or 
> format state store.
> In addition, RM should be able to report all fatal errors (which will 
> shutdown RM) when doing app recovery, this can save admin some time to remove 
> apps in bad state.

This message was sent by Atlassian JIRA

Reply via email to