[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704935#comment-13704935 ]
Omkar Vinit Joshi commented on YARN-541: ---------------------------------------- [~write2kishore] I just took a look at nm logs and I can see that "container_1366096597608_0001_01_000006" container was allocated by RM and AM made a start container request for it on NM. I think there is some problem in the AM logs. Can you take a look at your AM code again? Looks like something is getting missed there.. If it is still occurring then can you print the logs when AM makes a start container request to NM?? probably something is getting missed there.. {code} 2013-04-16 03:29:57,681 INFO [IPC Server handler 4 on 34660] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainer(402)) - Start request for container_1366096597608_0001_01_000006 by user dsadm 2013-04-16 03:29:57,684 INFO [IPC Server handler 4 on 34660] nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=dsadm IP=127.0.1.1 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1366096597608_0001 CONTAINERID=container_1366096597608_0001_01_000006 2013-04-16 03:29:57,687 INFO [AsyncDispatcher event handler] application.Application (ApplicationImpl.java:transition(255)) - Adding container_1366096597608_0001_01_000006 to application application_1366096597608_0001 2013-04-16 03:29:57,689 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835)) - Container container_1366096597608_0001_01_000006 transitioned from NEW to LOCALIZED 2013-04-16 03:29:57,952 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835)) - Container container_1366096597608_0001_01_000006 transitioned from LOCALIZED to RUNNING 2013-04-16 03:29:58,475 INFO [Node Status Updater] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNodeStatus(249)) - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-04-16 03:29:58,478 INFO [Node Status Updater] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNodeStatus(249)) - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 5, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-04-16 03:29:58,481 INFO [Node Status Updater] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNodeStatus(249)) - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 6, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-04-16 03:29:58,489 INFO [ContainersLauncher #2] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(113)) - launchContainer: [bash, /tmp/nm-local-dir/usercache/dsadm/appcache/application_1366096597608_0001/container_1366096597608_0001_01_000006/default_container_executor.sh] 2013-04-16 03:29:58,638 INFO [ContainersLauncher #1] launcher.ContainerLaunch (ContainerLaunch.java:call(282)) - Container container_1366096597608_0001_01_000005 succeeded 2013-04-16 03:29:58,639 INFO [ContainersLauncher #2] launcher.ContainerLaunch (ContainerLaunch.java:call(282)) - Container container_1366096597608_0001_01_000006 succeeded 2013-04-16 03:29:58,643 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835)) - Container container_1366096597608_0001_01_000005 transitioned from RUNNING to EXITED_WITH_SUCCESS 2013-04-16 03:29:58,644 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835)) - Container container_1366096597608_0001_01_000006 transitioned from RUNNING to EXITED_WITH_SUCCESS 2013-04-16 03:29:58,644 INFO [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(300)) - Cleaning up container container_1366096597608_0001_01_000005 2013-04-16 03:29:58,693 INFO [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(300)) - Cleaning up container container_1366096597608_0001_01_000006 {code} > getAllocatedContainers() is not returning all the allocated containers > ---------------------------------------------------------------------- > > Key: YARN-541 > URL: https://issues.apache.org/jira/browse/YARN-541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.0.3-alpha > Environment: Redhat Linux 64-bit > Reporter: Krishna Kishore Bonagiri > Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, > yarn-dsadm-resourcemanager-isredeng.out > > > I am running an application that was written and working well with the > hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the > getAllocatedContainers() method called on AMResponse is not returning all the > containers allocated sometimes. For example, I request for 10 containers and > this method gives me only 9 containers sometimes, and when I looked at the > log of Resource Manager, the 10th container is also allocated. It happens > only sometimes randomly and works fine all other times. If I send one more > request for the remaining container to RM after it failed to give them the > first time(and before releasing already acquired ones), it could allocate > that container. I am running only one application at a time, but 1000s of > them one after another. > My main worry is, even though the RM's log is saying that all 10 requested > containers are allocated, the getAllocatedContainers() method is not > returning me all of them, it returned only 9 surprisingly. I never saw this > kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. > Thanks, > Kishore > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira