[
https://issues.apache.org/jira/browse/YARN-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305129#comment-15305129
]
Jun Gong commented on YARN-5178:
--------------------------------
Thanks [~tuyuri] for reporting the issue. Could you please upload two RMs logs
if it is possible? It seems caused by that the RMApp was in ACCEPTED state and
RM HA started before none of RMAppAttempt was saved.
> yarn application never can be killed when failover resource manager
> -------------------------------------------------------------------
>
> Key: YARN-5178
> URL: https://issues.apache.org/jira/browse/YARN-5178
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: tu nguyen khac
> Priority: Minor
>
> Dear all
> problem i detected is that :
> In my cluster enviroment ( 16 nodes , 2 ResourceManager , HA )
> When an application are submitted in resource manager (Rs ) 1st , suddenly
> that Rs1 machine are hang , this application is failover to Rs2 but it never
> can be run :
> Name: cpaBidEcom
> Application Type: SPARK
> Application Tags:
> State: ACCEPTED
> FinalStatus: UNDEFINED
> Started: 28-May-2016 01:46:13
> Elapsed: 7hrs, 35mins, 32sec
> Tracking URL: UNASSIGNED
> after that our developer try to kill this application by command :
> yarn application -kill app_
> we retried this output forever :
> 16/05/28 09:24:48 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:50 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:52 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:54 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:56 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:58 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:00 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:02 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:04 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:06 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:08 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:10 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:12 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:14 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:16 INFO impl.YarnClientImpl: Waiting for application
> application_1464374175189_0016 to be killed.
> I think it probably a bug . It 's hard to reproduce it but please review it
> for me
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]