[
https://issues.apache.org/jira/browse/YARN-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
rohithsharma updated YARN-929:
------------------------------
Description:
Configuration :
yarn.resourcemanager.am.max-retries = 3
Scenario is
NodeManager is killed forcefully i.e using kill -9 NM_PID.
After Node expiry , RM killed all the container running in this NodeManager.
But , MRAppMaster JVM is still running.
RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3. At
this point, there are 2 MRAppMaster is running parallely for same job Id
Problem from running 2 MRApp is 1st attempt appmaster deletes the job
information from hdfs which cause FileNotFoundException for 2nd attempt MRApp.
was:
Configuration :
yarn.resourcemanager.am.max-retries = 3
Scenario is
NodeManager is killed forcefully i.e using kill -9 NM_PID.
After Node expiry , RM killed all the container running in this NodeManager.
But , MRAppMaster JVM is still running.
RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3.
Problem from running 2 MRApp is 1st attempt appmaster deletes the job
information from hdfs which cause FileNotFoundException for 2nd attempt MRApp.
> 2 MRAppMaster spawned for same Job Id
> -------------------------------------
>
> Key: YARN-929
> URL: https://issues.apache.org/jira/browse/YARN-929
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.0.5-alpha
> Reporter: rohithsharma
>
> Configuration :
> yarn.resourcemanager.am.max-retries = 3
> Scenario is
> NodeManager is killed forcefully i.e using kill -9 NM_PID.
> After Node expiry , RM killed all the container running in this
> NodeManager.
> But , MRAppMaster JVM is still running.
> RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3.
> At this point, there are 2 MRAppMaster is running parallely for same job Id
> Problem from running 2 MRApp is 1st attempt appmaster deletes the job
> information from hdfs which cause FileNotFoundException for 2nd attempt
> MRApp.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira