[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704935#comment-13704935
 ] 

Omkar Vinit Joshi commented on YARN-541:
----------------------------------------

[~write2kishore] I just took a look at nm logs and I can see that 
"container_1366096597608_0001_01_000006" container was allocated by RM and AM 
made a start container request for it on NM. I think there is some problem in 
the AM logs. Can you take a look at your AM code again? Looks like something is 
getting missed there.. If it is still occurring then can you print the logs 
when AM makes a start container request to NM?? probably something is getting 
missed there..

{code}
2013-04-16 03:29:57,681 INFO  [IPC Server handler 4 on 34660] 
containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainer(402)) - Start request for 
container_1366096597608_0001_01_000006 by user dsadm
2013-04-16 03:29:57,684 INFO  [IPC Server handler 4 on 34660] 
nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=dsadm      
  IP=127.0.1.1    OPERATION=Start Container Request       
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1366096597608_0001    
CONTAINERID=container_1366096597608_0001_01_000006
2013-04-16 03:29:57,687 INFO  [AsyncDispatcher event handler] 
application.Application (ApplicationImpl.java:transition(255)) - Adding 
container_1366096597608_0001_01_000006 to application 
application_1366096597608_0001
2013-04-16 03:29:57,689 INFO  [AsyncDispatcher event handler] 
container.Container (ContainerImpl.java:handle(835)) - Container 
container_1366096597608_0001_01_000006 transitioned from NEW to LOCALIZED
2013-04-16 03:29:57,952 INFO  [AsyncDispatcher event handler] 
container.Container (ContainerImpl.java:handle(835)) - Container 
container_1366096597608_0001_01_000006 transitioned from LOCALIZED to RUNNING
2013-04-16 03:29:58,475 INFO  [Node Status Updater] 
nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:getNodeStatus(249)) - Sending out status for 
container: container_id {, app_attempt_id {, application_id {, id: 1, 
cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 1, }, state: 
C_RUNNING, diagnostics: "", exit_status: -1000, 
2013-04-16 03:29:58,478 INFO  [Node Status Updater] 
nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:getNodeStatus(249)) - Sending out status for 
container: container_id {, app_attempt_id {, application_id {, id: 1, 
cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 5, }, state: 
C_RUNNING, diagnostics: "", exit_status: -1000, 
2013-04-16 03:29:58,481 INFO  [Node Status Updater] 
nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:getNodeStatus(249)) - Sending out status for 
container: container_id {, app_attempt_id {, application_id {, id: 1, 
cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 6, }, state: 
C_RUNNING, diagnostics: "", exit_status: -1000, 
2013-04-16 03:29:58,489 INFO  [ContainersLauncher #2] 
nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(113)) - launchContainer: [bash, 
/tmp/nm-local-dir/usercache/dsadm/appcache/application_1366096597608_0001/container_1366096597608_0001_01_000006/default_container_executor.sh]
2013-04-16 03:29:58,638 INFO  [ContainersLauncher #1] launcher.ContainerLaunch 
(ContainerLaunch.java:call(282)) - Container 
container_1366096597608_0001_01_000005 succeeded 
2013-04-16 03:29:58,639 INFO  [ContainersLauncher #2] launcher.ContainerLaunch 
(ContainerLaunch.java:call(282)) - Container 
container_1366096597608_0001_01_000006 succeeded 
2013-04-16 03:29:58,643 INFO  [AsyncDispatcher event handler] 
container.Container (ContainerImpl.java:handle(835)) - Container 
container_1366096597608_0001_01_000005 transitioned from RUNNING to 
EXITED_WITH_SUCCESS
2013-04-16 03:29:58,644 INFO  [AsyncDispatcher event handler] 
container.Container (ContainerImpl.java:handle(835)) - Container 
container_1366096597608_0001_01_000006 transitioned from RUNNING to 
EXITED_WITH_SUCCESS
2013-04-16 03:29:58,644 INFO  [AsyncDispatcher event handler] 
launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(300)) - 
Cleaning up container container_1366096597608_0001_01_000005
2013-04-16 03:29:58,693 INFO  [AsyncDispatcher event handler] 
launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(300)) - 
Cleaning up container container_1366096597608_0001_01_000006
{code}
                
> getAllocatedContainers() is not returning all the allocated containers
> ----------------------------------------------------------------------
>
>                 Key: YARN-541
>                 URL: https://issues.apache.org/jira/browse/YARN-541
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.3-alpha
>         Environment: Redhat Linux 64-bit
>            Reporter: Krishna Kishore Bonagiri
>         Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
> yarn-dsadm-resourcemanager-isredeng.out
>
>
> I am running an application that was written and working well with the 
> hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
> getAllocatedContainers() method called on AMResponse is not returning all the 
> containers allocated sometimes. For example, I request for 10 containers and 
> this method gives me only 9 containers sometimes, and when I looked at the 
> log of Resource Manager, the 10th container is also allocated. It happens 
> only sometimes randomly and works fine all other times. If I send one more 
> request for the remaining container to RM after it failed to give them the 
> first time(and before releasing already acquired ones), it could allocate 
> that container. I am running only one application at a time, but 1000s of 
> them one after another.
> My main worry is, even though the RM's log is saying that all 10 requested 
> containers are allocated,  the getAllocatedContainers() method is not 
> returning me all of them, it returned only 9 surprisingly. I never saw this 
> kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
> Thanks,
> Kishore
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to