[
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704935#comment-13704935
]
Omkar Vinit Joshi commented on YARN-541:
----------------------------------------
[~write2kishore] I just took a look at nm logs and I can see that
"container_1366096597608_0001_01_000006" container was allocated by RM and AM
made a start container request for it on NM. I think there is some problem in
the AM logs. Can you take a look at your AM code again? Looks like something is
getting missed there.. If it is still occurring then can you print the logs
when AM makes a start container request to NM?? probably something is getting
missed there..
{code}
2013-04-16 03:29:57,681 INFO [IPC Server handler 4 on 34660]
containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:startContainer(402)) - Start request for
container_1366096597608_0001_01_000006 by user dsadm
2013-04-16 03:29:57,684 INFO [IPC Server handler 4 on 34660]
nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=dsadm
IP=127.0.1.1 OPERATION=Start Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS
APPID=application_1366096597608_0001
CONTAINERID=container_1366096597608_0001_01_000006
2013-04-16 03:29:57,687 INFO [AsyncDispatcher event handler]
application.Application (ApplicationImpl.java:transition(255)) - Adding
container_1366096597608_0001_01_000006 to application
application_1366096597608_0001
2013-04-16 03:29:57,689 INFO [AsyncDispatcher event handler]
container.Container (ContainerImpl.java:handle(835)) - Container
container_1366096597608_0001_01_000006 transitioned from NEW to LOCALIZED
2013-04-16 03:29:57,952 INFO [AsyncDispatcher event handler]
container.Container (ContainerImpl.java:handle(835)) - Container
container_1366096597608_0001_01_000006 transitioned from LOCALIZED to RUNNING
2013-04-16 03:29:58,475 INFO [Node Status Updater]
nodemanager.NodeStatusUpdaterImpl
(NodeStatusUpdaterImpl.java:getNodeStatus(249)) - Sending out status for
container: container_id {, app_attempt_id {, application_id {, id: 1,
cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 1, }, state:
C_RUNNING, diagnostics: "", exit_status: -1000,
2013-04-16 03:29:58,478 INFO [Node Status Updater]
nodemanager.NodeStatusUpdaterImpl
(NodeStatusUpdaterImpl.java:getNodeStatus(249)) - Sending out status for
container: container_id {, app_attempt_id {, application_id {, id: 1,
cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 5, }, state:
C_RUNNING, diagnostics: "", exit_status: -1000,
2013-04-16 03:29:58,481 INFO [Node Status Updater]
nodemanager.NodeStatusUpdaterImpl
(NodeStatusUpdaterImpl.java:getNodeStatus(249)) - Sending out status for
container: container_id {, app_attempt_id {, application_id {, id: 1,
cluster_timestamp: 1366096597608, }, attemptId: 1, }, id: 6, }, state:
C_RUNNING, diagnostics: "", exit_status: -1000,
2013-04-16 03:29:58,489 INFO [ContainersLauncher #2]
nodemanager.DefaultContainerExecutor
(DefaultContainerExecutor.java:launchContainer(113)) - launchContainer: [bash,
/tmp/nm-local-dir/usercache/dsadm/appcache/application_1366096597608_0001/container_1366096597608_0001_01_000006/default_container_executor.sh]
2013-04-16 03:29:58,638 INFO [ContainersLauncher #1] launcher.ContainerLaunch
(ContainerLaunch.java:call(282)) - Container
container_1366096597608_0001_01_000005 succeeded
2013-04-16 03:29:58,639 INFO [ContainersLauncher #2] launcher.ContainerLaunch
(ContainerLaunch.java:call(282)) - Container
container_1366096597608_0001_01_000006 succeeded
2013-04-16 03:29:58,643 INFO [AsyncDispatcher event handler]
container.Container (ContainerImpl.java:handle(835)) - Container
container_1366096597608_0001_01_000005 transitioned from RUNNING to
EXITED_WITH_SUCCESS
2013-04-16 03:29:58,644 INFO [AsyncDispatcher event handler]
container.Container (ContainerImpl.java:handle(835)) - Container
container_1366096597608_0001_01_000006 transitioned from RUNNING to
EXITED_WITH_SUCCESS
2013-04-16 03:29:58,644 INFO [AsyncDispatcher event handler]
launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(300)) -
Cleaning up container container_1366096597608_0001_01_000005
2013-04-16 03:29:58,693 INFO [AsyncDispatcher event handler]
launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(300)) -
Cleaning up container container_1366096597608_0001_01_000006
{code}
> getAllocatedContainers() is not returning all the allocated containers
> ----------------------------------------------------------------------
>
> Key: YARN-541
> URL: https://issues.apache.org/jira/browse/YARN-541
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.0.3-alpha
> Environment: Redhat Linux 64-bit
> Reporter: Krishna Kishore Bonagiri
> Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out,
> yarn-dsadm-resourcemanager-isredeng.out
>
>
> I am running an application that was written and working well with the
> hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the
> getAllocatedContainers() method called on AMResponse is not returning all the
> containers allocated sometimes. For example, I request for 10 containers and
> this method gives me only 9 containers sometimes, and when I looked at the
> log of Resource Manager, the 10th container is also allocated. It happens
> only sometimes randomly and works fine all other times. If I send one more
> request for the remaining container to RM after it failed to give them the
> first time(and before releasing already acquired ones), it could allocate
> that container. I am running only one application at a time, but 1000s of
> them one after another.
> My main worry is, even though the RM's log is saying that all 10 requested
> containers are allocated, the getAllocatedContainers() method is not
> returning me all of them, it returned only 9 surprisingly. I never saw this
> kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
> Thanks,
> Kishore
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira