[jira] [Created] (YARN-1439) Taking around 800 to 900 ms to connect from AM to RM
Krishna Kishore Bonagiri created YARN-1439: -- Summary: Taking around 800 to 900 ms to connect from AM to RM Key: YARN-1439 URL: https://issues.apache.org/jira/browse/YARN-1439 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.2.0 Reporter: Krishna Kishore Bonagiri Hi, The stat() call in following code for connecting from Application Master to the Resource Manager is taking between 800 to 900 ms, tried with both managed and unmanaged applications. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Vinod Kumar asked me to raise a bug on this, for more info: http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201310.mbox/%3ccahg+sbpd+uvzvbjodd1lupg1neu2dlw51wukeabycsuia9z...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708234#comment-13708234 ] Krishna Kishore Bonagiri commented on YARN-541: --- Hi Hudson & Hitesh, Does this mean this issue is fixed? Do you guys expect reproduce the issue still and give you logs? I am sorry for the delay, I have been busy with other things at work. Thanks, Kishore > getAllocatedContainers() is not returning all the allocated containers > -- > > Key: YARN-541 > URL: https://issues.apache.org/jira/browse/YARN-541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha > Environment: Redhat Linux 64-bit >Reporter: Krishna Kishore Bonagiri >Assignee: Bikas Saha >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: AppMaster.stdout, YARN-541.1.patch, > yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out > > > I am running an application that was written and working well with the > hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the > getAllocatedContainers() method called on AMResponse is not returning all the > containers allocated sometimes. For example, I request for 10 containers and > this method gives me only 9 containers sometimes, and when I looked at the > log of Resource Manager, the 10th container is also allocated. It happens > only sometimes randomly and works fine all other times. If I send one more > request for the remaining container to RM after it failed to give them the > first time(and before releasing already acquired ones), it could allocate > that container. I am running only one application at a time, but 1000s of > them one after another. > My main worry is, even though the RM's log is saying that all 10 requested > containers are allocated, the getAllocatedContainers() method is not > returning me all of them, it returned only 9 surprisingly. I never saw this > kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. > Thanks, > Kishore > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706646#comment-13706646 ] Krishna Kishore Bonagiri commented on YARN-541: --- Hitesh, How can I do that? > getAllocatedContainers() is not returning all the allocated containers > -- > > Key: YARN-541 > URL: https://issues.apache.org/jira/browse/YARN-541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha > Environment: Redhat Linux 64-bit >Reporter: Krishna Kishore Bonagiri >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, > yarn-dsadm-resourcemanager-isredeng.out > > > I am running an application that was written and working well with the > hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the > getAllocatedContainers() method called on AMResponse is not returning all the > containers allocated sometimes. For example, I request for 10 containers and > this method gives me only 9 containers sometimes, and when I looked at the > log of Resource Manager, the 10th container is also allocated. It happens > only sometimes randomly and works fine all other times. If I send one more > request for the remaining container to RM after it failed to give them the > first time(and before releasing already acquired ones), it could allocate > that container. I am running only one application at a time, but 1000s of > them one after another. > My main worry is, even though the RM's log is saying that all 10 requested > containers are allocated, the getAllocatedContainers() method is not > returning me all of them, it returned only 9 surprisingly. I never saw this > kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. > Thanks, > Kishore > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706589#comment-13706589 ] Krishna Kishore Bonagiri commented on YARN-541: --- I shall try to get you the logs you needed today or as soon as possible and reopen it. On Fri, Jul 12, 2013 at 5:49 AM, Omkar Vinit Joshi (JIRA) > getAllocatedContainers() is not returning all the allocated containers > -- > > Key: YARN-541 > URL: https://issues.apache.org/jira/browse/YARN-541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha > Environment: Redhat Linux 64-bit >Reporter: Krishna Kishore Bonagiri >Assignee: Omkar Vinit Joshi > Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, > yarn-dsadm-resourcemanager-isredeng.out > > > I am running an application that was written and working well with the > hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the > getAllocatedContainers() method called on AMResponse is not returning all the > containers allocated sometimes. For example, I request for 10 containers and > this method gives me only 9 containers sometimes, and when I looked at the > log of Resource Manager, the 10th container is also allocated. It happens > only sometimes randomly and works fine all other times. If I send one more > request for the remaining container to RM after it failed to give them the > first time(and before releasing already acquired ones), it could allocate > that container. I am running only one application at a time, but 1000s of > them one after another. > My main worry is, even though the RM's log is saying that all 10 requested > containers are allocated, the getAllocatedContainers() method is not > returning me all of them, it returned only 9 surprisingly. I never saw this > kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. > Thanks, > Kishore > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648300#comment-13648300 ] Krishna Kishore Bonagiri commented on YARN-541: --- Hi Hitesh, I am very curious to know if you could reproduce and resolve this issue. Thanks, Kishore On Tue, Apr 16, 2013 at 1:35 PM, Krishna Kishore Bonagiri < > getAllocatedContainers() is not returning all the allocated containers > -- > > Key: YARN-541 > URL: https://issues.apache.org/jira/browse/YARN-541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha > Environment: Redhat Linux 64-bit >Reporter: Krishna Kishore Bonagiri > Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, > yarn-dsadm-resourcemanager-isredeng.out > > > I am running an application that was written and working well with the > hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the > getAllocatedContainers() method called on AMResponse is not returning all the > containers allocated sometimes. For example, I request for 10 containers and > this method gives me only 9 containers sometimes, and when I looked at the > log of Resource Manager, the 10th container is also allocated. It happens > only sometimes randomly and works fine all other times. If I send one more > request for the remaining container to RM after it failed to give them the > first time(and before releasing already acquired ones), it could allocate > that container. I am running only one application at a time, but 1000s of > them one after another. > My main worry is, even though the RM's log is saying that all 10 requested > containers are allocated, the getAllocatedContainers() method is not > returning me all of them, it returned only 9 surprisingly. I never saw this > kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. > Thanks, > Kishore > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-501) Application Master getting killed randomly reporting excess usage of memory
[ https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633829#comment-13633829 ] Krishna Kishore Bonagiri commented on YARN-501: --- No problem Hitesh, I can do that. Can you please tell me how do I set that log level? Thanks, Kishore > Application Master getting killed randomly reporting excess usage of memory > --- > > Key: YARN-501 > URL: https://issues.apache.org/jira/browse/YARN-501 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell, nodemanager >Affects Versions: 2.0.3-alpha >Reporter: Krishna Kishore Bonagiri > > I am running a date command using the Distributed Shell example in a loop of > 500 times. It ran successfully all the times except one time where it gave > the following error. > 2013-03-22 04:33:25,280 INFO [main] distributedshell.Client > (Client.java:monitorApplication(605)) - Got application report from ASM for, > appId=222, clientToken=null, appDiagnostics=Application > application_1363938200742_0222 failed 1 times due to AM Container for > appattempt_1363938200742_0222_01 exited with exitCode: 143 due to: > Container [pid=21141,containerID=container_1363938200742_0222_01_01] is > running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb > physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing > container. > Dump of the process-tree for container_1363938200742_0222_01_01 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 > /home_/dsadm/yarn/jdk//bin/java -Xmx128m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_memory 10 --num_containers 2 --priority 0 --shell_command date > |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c > /home_/dsadm/yarn/jdk//bin/java -Xmx128m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_memory 10 --num_containers 2 --priority 0 --shell_command date > 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout > > 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-168) No way to turn off virtual memory limits without turning off physical memory limits
[ https://issues.apache.org/jira/browse/YARN-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632868#comment-13632868 ] Krishna Kishore Bonagiri commented on YARN-168: --- How can I get this fix if I want it now? When would the next release be? I mean, the one having this fix! > No way to turn off virtual memory limits without turning off physical memory > limits > --- > > Key: YARN-168 > URL: https://issues.apache.org/jira/browse/YARN-168 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Harsh J > > Asked and reported by a user (Krishna) on ML: > {quote} > This is possible to do, but you've hit a bug with the current YARN > implementation. Ideally you should be able to configure the vmem-pmem > ratio (or an equivalent config) to be -1, to indicate disabling of > virtual memory checks completely (and there's indeed checks for this), > but it seems like we are enforcing the ratio to be at least 1.0 (and > hence negatives are disallowed). > You can't workaround by setting the NM's offered resource.mb to -1 > either, as you'll lose out on controlling maximum allocations. > Please file a YARN bug on JIRA. The code at fault lies under > ContainersMonitorImpl#init(…). > On Thu, Oct 18, 2012 at 4:00 PM, Krishna Kishore Bonagiri > wrote: > > Hi, > > > > Is there a way we can ask the YARN RM for not killing a container when it > > uses excess virtual memory than the maximum it can use as per the > > specification in the configuration file yarn-site.xml? We can't always > > estimate the amount of virtual memory needed for our application running on > > a container, but we don't want to get it killed in a case it exceeds the > > maximum limit. > > > > Please suggest as to how can we come across this issue. > > > > Thanks, > > Kishore > {quote} > Basically, we're doing: > {code} > // / Virtual memory configuration // > float vmemRatio = conf.getFloat( > YarnConfiguration.NM_VMEM_PMEM_RATIO, > YarnConfiguration.DEFAULT_NM_VMEM_PMEM_RATIO); > Preconditions.checkArgument(vmemRatio > 0.99f, > YarnConfiguration.NM_VMEM_PMEM_RATIO + > " should be at least 1.0"); > this.maxVmemAllottedForContainers = > (long)(vmemRatio * maxPmemAllottedForContainers); > {code} > For virtual memory monitoring to be disabled, maxVmemAllottedForContainers > has to be -1. For that to be -1, given the above buggy computation, vmemRatio > must be -1 or maxPmemAllottedForContainers must be -1. > If vmemRatio were -1, we fail the precondition check and exit. > If maxPmemAllottedForContainers, we also end up disabling physical memory > monitoring. > Or perhaps that makes sense - to disable both physical and virtual memory > monitoring, but that way your NM becomes infinite in resource grants, I think. > We need a way to selectively disable kills done via virtual memory > monitoring, which is the base request here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-501) Application Master getting killed randomly reporting excess usage of memory
[ https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kishore Bonagiri reopened YARN-501: --- Added information dated 24th March, and 15th April. Please see above. As mentioned, it is not really issue with using more than allocated for the container. This issue happens randomly when ran the same application either mine, or the Distributed Shell example also, with just a date command. Thanks, Kishore > Application Master getting killed randomly reporting excess usage of memory > --- > > Key: YARN-501 > URL: https://issues.apache.org/jira/browse/YARN-501 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell, nodemanager >Affects Versions: 2.0.3-alpha >Reporter: Krishna Kishore Bonagiri > > I am running a date command using the Distributed Shell example in a loop of > 500 times. It ran successfully all the times except one time where it gave > the following error. > 2013-03-22 04:33:25,280 INFO [main] distributedshell.Client > (Client.java:monitorApplication(605)) - Got application report from ASM for, > appId=222, clientToken=null, appDiagnostics=Application > application_1363938200742_0222 failed 1 times due to AM Container for > appattempt_1363938200742_0222_01 exited with exitCode: 143 due to: > Container [pid=21141,containerID=container_1363938200742_0222_01_01] is > running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb > physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing > container. > Dump of the process-tree for container_1363938200742_0222_01_01 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 > /home_/dsadm/yarn/jdk//bin/java -Xmx128m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_memory 10 --num_containers 2 --priority 0 --shell_command date > |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c > /home_/dsadm/yarn/jdk//bin/java -Xmx128m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_memory 10 --num_containers 2 --priority 0 --shell_command date > 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout > > 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kishore Bonagiri updated YARN-541: -- Attachment: yarn-dsadm-resourcemanager-isredeng.out yarn-dsadm-nodemanager-isredeng.out AppMaster.stdout Hi Hitesh, I am attaching the logs for AM, RM, and NM. I have an application being run in a loop, which requires 5 containers. The 8th run has failed with this issue of getAllocatedContainers(). The Application Master couldn't get all the 5 containers it required, the getAllocatedContainers() method returned only 4. The RM's log is saying that the 5th container is also allocated thro' the message, 2013-04-16 03:32:54,701 INFO [ResourceManager Event Processor] rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(220)) - container_1366096597608_0008_01_06 Container Transitioned from NEW to ALLOCATED In RM's log, you can see that this kind of for the remaining 4 containers also, i.e. container_1366096597608_0008_01_02 to container_1366096597608_0008_01_05. Also, as I said before this issue is seen randomly. Thanks, Kishore > getAllocatedContainers() is not returning all the allocated containers > -- > > Key: YARN-541 > URL: https://issues.apache.org/jira/browse/YARN-541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha > Environment: Redhat Linux 64-bit >Reporter: Krishna Kishore Bonagiri > Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, > yarn-dsadm-resourcemanager-isredeng.out > > > I am running an application that was written and working well with the > hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the > getAllocatedContainers() method called on AMResponse is not returning all the > containers allocated sometimes. For example, I request for 10 containers and > this method gives me only 9 containers sometimes, and when I looked at the > log of Resource Manager, the 10th container is also allocated. It happens > only sometimes randomly and works fine all other times. If I send one more > request for the remaining container to RM after it failed to give them the > first time(and before releasing already acquired ones), it could allocate > that container. I am running only one application at a time, but 1000s of > them one after another. > My main worry is, even though the RM's log is saying that all 10 requested > containers are allocated, the getAllocatedContainers() method is not > returning me all of them, it returned only 9 surprisingly. I never saw this > kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. > Thanks, > Kishore > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-501) Application Master getting killed randomly reporting excess usage of memory
[ https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631798#comment-13631798 ] Krishna Kishore Bonagiri commented on YARN-501: --- What I have observed today is that this error is coming at some regular intervals of 50 minutes. And at that particular interval of time, I am seeing the following kind of messages in the node manager's log: So, I think being the node manager busy with some other task like this monitoring is causing the error of virtual memory for AM's container. 2013-04-12 15:51:02,048 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) - Starting resource-monitoring for container_1365688251527_6643_01_03 2013-04-12 15:51:02,048 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) - Starting resource-monitoring for container_1365688251527_6642_01_04 2013-04-12 15:51:02,049 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) - Starting resource-monitoring for container_1365688251527_6641_01_05 2013-04-12 15:51:02,049 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) - Starting resource-monitoring for container_1365688251527_6640_01_06 2013-04-12 15:51:02,049 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for container_1365688251527_6524_01_01 2013-04-12 15:51:02,049 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for container_1365688251527_6525_01_02 2013-04-12 15:51:02,049 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for container_1365688251527_6525_01_03 2013-04-12 15:51:02,049 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for container_1365688251527_6525_01_04 On Sun, Mar 24, 2013 at 3:54 PM, Krishna Kishore Bonagiri < > Application Master getting killed randomly reporting excess usage of memory > --- > > Key: YARN-501 > URL: https://issues.apache.org/jira/browse/YARN-501 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell, nodemanager >Affects Versions: 2.0.3-alpha >Reporter: Krishna Kishore Bonagiri > > I am running a date command using the Distributed Shell example in a loop of > 500 times. It ran successfully all the times except one time where it gave > the following error. > 2013-03-22 04:33:25,280 INFO [main] distributedshell.Client > (Client.java:monitorApplication(605)) - Got application report from ASM for, > appId=222, clientToken=null, appDiagnostics=Application > application_1363938200742_0222 failed 1 times due to AM Container for > appattempt_1363938200742_0222_01 exited with exitCode: 143 due to: > Container [pid=21141,containerID=container_1363938200742_0222_01_01] is > running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb > physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing > container. > Dump of the process-tree for container_1363938200742_0222_01_01 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 > /home_/dsadm/yarn/jdk//bin/java -Xmx128m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_memory 10 --num_containers 2 --priority 0 --shell_command date > |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c > /home_/dsadm/yarn/jdk//bin/java -Xmx128m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_memory 10 --num_containers 2 --priority 0 --shell_command date > 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout > > 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631564#comment-13631564 ] Krishna Kishore Bonagiri commented on YARN-541: --- Hi Hitesh, Thanks for the reply. I have actually changed my code currently to send new requests for the remaining of required containers, so I am not seeing this error. I shall revert my changes and try to reproduce the error and send you the logs in 1 or 2 days. Thanks, Kishore > getAllocatedContainers() is not returning all the allocated containers > -- > > Key: YARN-541 > URL: https://issues.apache.org/jira/browse/YARN-541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.3-alpha > Environment: Redhat Linux 64-bit >Reporter: Krishna Kishore Bonagiri > > I am running an application that was written and working well with the > hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the > getAllocatedContainers() method called on AMResponse is not returning all the > containers allocated sometimes. For example, I request for 10 containers and > this method gives me only 9 containers sometimes, and when I looked at the > log of Resource Manager, the 10th container is also allocated. It happens > only sometimes randomly and works fine all other times. If I send one more > request for the remaining container to RM after it failed to give them the > first time(and before releasing already acquired ones), it could allocate > that container. I am running only one application at a time, but 1000s of > them one after another. > My main worry is, even though the RM's log is saying that all 10 requested > containers are allocated, the getAllocatedContainers() method is not > returning me all of them, it returned only 9 surprisingly. I never saw this > kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. > Thanks, > Kishore > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
Krishna Kishore Bonagiri created YARN-541: - Summary: getAllocatedContainers() is not returning all the allocated containers Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-501) Application Master getting killed randomly reporting excess usage of memory
[ https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612057#comment-13612057 ] Krishna Kishore Bonagiri commented on YARN-501: --- Hi Ravi, The problem I am reporting is, why should it give this kind of error very randomly when it could run fine all other times with the same amount of memory allocations. I see this error once in 500 times. Also, as I said it is only the date command that I am running with the Distributed Shell example jar that comes with the hadoop-2.0.3-alpha. So, I suppose this should work the same way all the times. Thanks, Kishore > Application Master getting killed randomly reporting excess usage of memory > --- > > Key: YARN-501 > URL: https://issues.apache.org/jira/browse/YARN-501 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell, nodemanager >Affects Versions: 2.0.3-alpha >Reporter: Krishna Kishore Bonagiri > > I am running a date command using the Distributed Shell example in a loop of > 500 times. It ran successfully all the times except one time where it gave > the following error. > 2013-03-22 04:33:25,280 INFO [main] distributedshell.Client > (Client.java:monitorApplication(605)) - Got application report from ASM for, > appId=222, clientToken=null, appDiagnostics=Application > application_1363938200742_0222 failed 1 times due to AM Container for > appattempt_1363938200742_0222_01 exited with exitCode: 143 due to: > Container [pid=21141,containerID=container_1363938200742_0222_01_01] is > running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb > physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing > container. > Dump of the process-tree for container_1363938200742_0222_01_01 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 > /home_/dsadm/yarn/jdk//bin/java -Xmx128m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_memory 10 --num_containers 2 --priority 0 --shell_command date > |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c > /home_/dsadm/yarn/jdk//bin/java -Xmx128m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_memory 10 --num_containers 2 --priority 0 --shell_command date > 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout > > 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-501) Application Master getting killed randomly reporting excess usage of memory
Krishna Kishore Bonagiri created YARN-501: - Summary: Application Master getting killed randomly reporting excess usage of memory Key: YARN-501 URL: https://issues.apache.org/jira/browse/YARN-501 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell, nodemanager Affects Versions: 2.0.3-alpha Reporter: Krishna Kishore Bonagiri I am running a date command using the Distributed Shell example in a loop of 500 times. It ran successfully all the times except one time where it gave the following error. 2013-03-22 04:33:25,280 INFO [main] distributedshell.Client (Client.java:monitorApplication(605)) - Got application report from ASM for, appId=222, clientToken=null, appDiagnostics=Application application_1363938200742_0222 failed 1 times due to AM Container for appattempt_1363938200742_0222_01 exited with exitCode: 143 due to: Container [pid=21141,containerID=container_1363938200742_0222_01_01] is running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing container. Dump of the process-tree for container_1363938200742_0222_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 /home_/dsadm/yarn/jdk//bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 2 --priority 0 --shell_command date |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c /home_/dsadm/yarn/jdk//bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 2 --priority 0 --shell_command date 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-492) Too many open files error to launch a container
Krishna Kishore Bonagiri created YARN-492: - Summary: Too many open files error to launch a container Key: YARN-492 URL: https://issues.apache.org/jira/browse/YARN-492 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.0-alpha Environment: RedHat Linux Reporter: Krishna Kishore Bonagiri I am running a date command with YARN's distributed shell example in a loop of 1000 times in this way: yarn jar /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar --shell_command date --num_containers 2 Around 730th time or so, I am getting an error in node manager's log saying that it failed to launch container because there are "Too many open files" and when I observe through lsof command,I find that there is one instance of this kind of file is left for each run of Application Master, and it kept growing as I am running it in loop. node1:44871->node1:50010 Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira