[ 
https://issues.apache.org/jira/browse/YARN-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-1847.
------------------------------

    Resolution: Invalid

> YARN application always exits with FAILED state
> -----------------------------------------------
>
>                 Key: YARN-1847
>                 URL: https://issues.apache.org/jira/browse/YARN-1847
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: Oleg Zhurakousky
>            Priority: Critical
>
> The _RMAppAttemptImpl_ creates an instance of ExpiredTransition which always 
> sets the _finalAttemptState_ to FAILED.
> {code}
> private static final ExpiredTransition EXPIRED_TRANSITION =
>       new ExpiredTransition();
> . . .
>     public ExpiredTransition() {
>       super(RMAppAttemptState.FAILED);
>     }
> {code}
> So, when my container successfully finishes regardless of the state (e.g., 
> CONTAINER_FINISHED in my case), the _RMAppAttemptImpl.transition(..)_ does a 
> switch on the _finalAttemptState_ and transitions to FAILED no matter what.
> Here is the related logs for more info:
> {code}
> 21:06:01,615  INFO AsyncDispatcher event handler container.Container:878 - 
> Container container_1395104684413_0001_01_000001 transitioned from RUNNING to 
> EXITED_WITH_SUCCESS
> 21:06:01,615  INFO AsyncDispatcher event handler launcher.ContainerLaunch:341 
> - Cleaning up container container_1395104684413_0001_01_000001
> 21:06:01,644  INFO DeletionService #0 
> nodemanager.DefaultContainerExecutor:369 - Deleting absolute path : 
> /Users/oleg/HADOOP_DEV/yarn-tutorial/target/oz.hadoop.StandAloneWithMiniYarnCluster/oz.hadoop.StandAloneWithMiniYarnCluster-localDir-nm-0_0/usercache/oleg/appcache/application_1395104684413_0001/container_1395104684413_0001_01_000001
> 21:06:01,646  INFO AsyncDispatcher event handler nodemanager.NMAuditLogger:89 
> - USER=oleg     OPERATION=Container Finished - Succeeded        
> TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_1395104684413_0001  
>   CONTAINERID=container_1395104684413_0001_01_000001
> 21:06:01,649  INFO AsyncDispatcher event handler container.Container:878 - 
> Container container_1395104684413_0001_01_000001 transitioned from 
> EXITED_WITH_SUCCESS to DONE
> 21:06:01,649  INFO AsyncDispatcher event handler application.Application:339 
> - Removing container_1395104684413_0001_01_000001 from application 
> application_1395104684413_0001
> 21:06:01,649  INFO AsyncDispatcher event handler 
> monitor.ContainersMonitorImpl:159 - ResourceCalculatorPlugin is unavailable 
> on this system. 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is disabled.
> 21:06:01,649  INFO AsyncDispatcher event handler 
> containermanager.AuxServices:175 - Got event CONTAINER_STOP for appId 
> application_1395104684413_0001
> 21:06:02,143  INFO Node Status Updater nodemanager.NodeStatusUpdaterImpl:374 
> - Removed completed container container_1395104684413_0001_01_000001
> 21:06:02,146  INFO ResourceManager Event Processor 
> rmcontainer.RMContainerImpl:220 - container_1395104684413_0001_01_000001 
> Container Transitioned from ACQUIRED to COMPLETED
> 21:06:02,146  INFO ResourceManager Event Processor fica.FiCaSchedulerApp:91 - 
> Completed container: container_1395104684413_0001_01_000001 in state: 
> COMPLETED event:FINISHED
> 21:06:02,146  INFO ResourceManager Event Processor 
> resourcemanager.RMAuditLogger:98 - USER=oleg       OPERATION=AM Released 
> Container TARGET=SchedulerApp     RESULT=SUCCESS  
> APPID=application_1395104684413_0001    
> CONTAINERID=container_1395104684413_0001_01_000001
> 21:06:02,146  INFO ResourceManager Event Processor fica.FiCaSchedulerNode:164 
> - Released container container_1395104684413_0001_01_000001 of capacity 
> <memory:1024, vCores:1> on host 192.168.19.1:50787, which currently has 0 
> containers, <memory:0, vCores:0> used and <memory:4096, vCores:8> available, 
> release resources=true
> 21:06:02,146  INFO ResourceManager Event Processor fifo.FifoScheduler:790 - 
> Application appattempt_1395104684413_0001_000001 released container 
> container_1395104684413_0001_01_000001 on node: host: 192.168.19.1:50787 
> #containers=0 available=4096 used=0 with event: FINISHED
> 21:06:02,146  INFO AsyncDispatcher event handler attempt.RMAppAttemptImpl:960 
> - Updating application attempt appattempt_1395104684413_0001_000001 with 
> final state: FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to