[
https://issues.apache.org/jira/browse/YARN-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe resolved YARN-1847.
------------------------------
Resolution: Invalid
> YARN application always exits with FAILED state
> -----------------------------------------------
>
> Key: YARN-1847
> URL: https://issues.apache.org/jira/browse/YARN-1847
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.3.0
> Reporter: Oleg Zhurakousky
> Priority: Critical
>
> The _RMAppAttemptImpl_ creates an instance of ExpiredTransition which always
> sets the _finalAttemptState_ to FAILED.
> {code}
> private static final ExpiredTransition EXPIRED_TRANSITION =
> new ExpiredTransition();
> . . .
> public ExpiredTransition() {
> super(RMAppAttemptState.FAILED);
> }
> {code}
> So, when my container successfully finishes regardless of the state (e.g.,
> CONTAINER_FINISHED in my case), the _RMAppAttemptImpl.transition(..)_ does a
> switch on the _finalAttemptState_ and transitions to FAILED no matter what.
> Here is the related logs for more info:
> {code}
> 21:06:01,615 INFO AsyncDispatcher event handler container.Container:878 -
> Container container_1395104684413_0001_01_000001 transitioned from RUNNING to
> EXITED_WITH_SUCCESS
> 21:06:01,615 INFO AsyncDispatcher event handler launcher.ContainerLaunch:341
> - Cleaning up container container_1395104684413_0001_01_000001
> 21:06:01,644 INFO DeletionService #0
> nodemanager.DefaultContainerExecutor:369 - Deleting absolute path :
> /Users/oleg/HADOOP_DEV/yarn-tutorial/target/oz.hadoop.StandAloneWithMiniYarnCluster/oz.hadoop.StandAloneWithMiniYarnCluster-localDir-nm-0_0/usercache/oleg/appcache/application_1395104684413_0001/container_1395104684413_0001_01_000001
> 21:06:01,646 INFO AsyncDispatcher event handler nodemanager.NMAuditLogger:89
> - USER=oleg OPERATION=Container Finished - Succeeded
> TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1395104684413_0001
> CONTAINERID=container_1395104684413_0001_01_000001
> 21:06:01,649 INFO AsyncDispatcher event handler container.Container:878 -
> Container container_1395104684413_0001_01_000001 transitioned from
> EXITED_WITH_SUCCESS to DONE
> 21:06:01,649 INFO AsyncDispatcher event handler application.Application:339
> - Removing container_1395104684413_0001_01_000001 from application
> application_1395104684413_0001
> 21:06:01,649 INFO AsyncDispatcher event handler
> monitor.ContainersMonitorImpl:159 - ResourceCalculatorPlugin is unavailable
> on this system.
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
> is disabled.
> 21:06:01,649 INFO AsyncDispatcher event handler
> containermanager.AuxServices:175 - Got event CONTAINER_STOP for appId
> application_1395104684413_0001
> 21:06:02,143 INFO Node Status Updater nodemanager.NodeStatusUpdaterImpl:374
> - Removed completed container container_1395104684413_0001_01_000001
> 21:06:02,146 INFO ResourceManager Event Processor
> rmcontainer.RMContainerImpl:220 - container_1395104684413_0001_01_000001
> Container Transitioned from ACQUIRED to COMPLETED
> 21:06:02,146 INFO ResourceManager Event Processor fica.FiCaSchedulerApp:91 -
> Completed container: container_1395104684413_0001_01_000001 in state:
> COMPLETED event:FINISHED
> 21:06:02,146 INFO ResourceManager Event Processor
> resourcemanager.RMAuditLogger:98 - USER=oleg OPERATION=AM Released
> Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1395104684413_0001
> CONTAINERID=container_1395104684413_0001_01_000001
> 21:06:02,146 INFO ResourceManager Event Processor fica.FiCaSchedulerNode:164
> - Released container container_1395104684413_0001_01_000001 of capacity
> <memory:1024, vCores:1> on host 192.168.19.1:50787, which currently has 0
> containers, <memory:0, vCores:0> used and <memory:4096, vCores:8> available,
> release resources=true
> 21:06:02,146 INFO ResourceManager Event Processor fifo.FifoScheduler:790 -
> Application appattempt_1395104684413_0001_000001 released container
> container_1395104684413_0001_01_000001 on node: host: 192.168.19.1:50787
> #containers=0 available=4096 used=0 with event: FINISHED
> 21:06:02,146 INFO AsyncDispatcher event handler attempt.RMAppAttemptImpl:960
> - Updating application attempt appattempt_1395104684413_0001_000001 with
> final state: FAILED
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)