[jira] [Created] (YARN-1439) Taking around 800 to 900 ms to connect from AM to RM

2013-11-22 Thread Krishna Kishore Bonagiri (JIRA)
Krishna Kishore Bonagiri created YARN-1439:
--

 Summary: Taking around 800 to 900 ms to connect from AM to RM
 Key: YARN-1439
 URL: https://issues.apache.org/jira/browse/YARN-1439
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 2.2.0
Reporter: Krishna Kishore Bonagiri


Hi, 
  The stat() call in following code for connecting from Application Master to 
the Resource Manager is taking between 800 to 900 ms, tried with both managed 
and unmanaged applications.

AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler();
amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener);
amRMClient.init(conf);
amRMClient.start();

Vinod Kumar asked me to raise a bug on this, for more info:

http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201310.mbox/%3ccahg+sbpd+uvzvbjodd1lupg1neu2dlw51wukeabycsuia9z...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-14 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708234#comment-13708234
 ] 

Krishna Kishore Bonagiri commented on YARN-541:
---

Hi Hudson & Hitesh,

  Does this mean this issue is fixed? Do you guys expect reproduce the
issue still and give you logs? I am sorry for the delay, I have been busy
with other things at work.

Thanks,
Kishore





> getAllocatedContainers() is not returning all the allocated containers
> --
>
> Key: YARN-541
> URL: https://issues.apache.org/jira/browse/YARN-541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.3-alpha
> Environment: Redhat Linux 64-bit
>Reporter: Krishna Kishore Bonagiri
>Assignee: Bikas Saha
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: AppMaster.stdout, YARN-541.1.patch, 
> yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out
>
>
> I am running an application that was written and working well with the 
> hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
> getAllocatedContainers() method called on AMResponse is not returning all the 
> containers allocated sometimes. For example, I request for 10 containers and 
> this method gives me only 9 containers sometimes, and when I looked at the 
> log of Resource Manager, the 10th container is also allocated. It happens 
> only sometimes randomly and works fine all other times. If I send one more 
> request for the remaining container to RM after it failed to give them the 
> first time(and before releasing already acquired ones), it could allocate 
> that container. I am running only one application at a time, but 1000s of 
> them one after another.
> My main worry is, even though the RM's log is saying that all 10 requested 
> containers are allocated,  the getAllocatedContainers() method is not 
> returning me all of them, it returned only 9 surprisingly. I never saw this 
> kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
> Thanks,
> Kishore
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706646#comment-13706646
 ] 

Krishna Kishore Bonagiri commented on YARN-541:
---

Hitesh,
  How can I do  that?





> getAllocatedContainers() is not returning all the allocated containers
> --
>
> Key: YARN-541
> URL: https://issues.apache.org/jira/browse/YARN-541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.3-alpha
> Environment: Redhat Linux 64-bit
>Reporter: Krishna Kishore Bonagiri
>Assignee: Omkar Vinit Joshi
> Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
> yarn-dsadm-resourcemanager-isredeng.out
>
>
> I am running an application that was written and working well with the 
> hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
> getAllocatedContainers() method called on AMResponse is not returning all the 
> containers allocated sometimes. For example, I request for 10 containers and 
> this method gives me only 9 containers sometimes, and when I looked at the 
> log of Resource Manager, the 10th container is also allocated. It happens 
> only sometimes randomly and works fine all other times. If I send one more 
> request for the remaining container to RM after it failed to give them the 
> first time(and before releasing already acquired ones), it could allocate 
> that container. I am running only one application at a time, but 1000s of 
> them one after another.
> My main worry is, even though the RM's log is saying that all 10 requested 
> containers are allocated,  the getAllocatedContainers() method is not 
> returning me all of them, it returned only 9 surprisingly. I never saw this 
> kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
> Thanks,
> Kishore
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706589#comment-13706589
 ] 

Krishna Kishore Bonagiri commented on YARN-541:
---

I shall try to get you the logs you needed today or as soon as possible and
reopen it.


On Fri, Jul 12, 2013 at 5:49 AM, Omkar Vinit Joshi (JIRA)



> getAllocatedContainers() is not returning all the allocated containers
> --
>
> Key: YARN-541
> URL: https://issues.apache.org/jira/browse/YARN-541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.3-alpha
> Environment: Redhat Linux 64-bit
>Reporter: Krishna Kishore Bonagiri
>Assignee: Omkar Vinit Joshi
> Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
> yarn-dsadm-resourcemanager-isredeng.out
>
>
> I am running an application that was written and working well with the 
> hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
> getAllocatedContainers() method called on AMResponse is not returning all the 
> containers allocated sometimes. For example, I request for 10 containers and 
> this method gives me only 9 containers sometimes, and when I looked at the 
> log of Resource Manager, the 10th container is also allocated. It happens 
> only sometimes randomly and works fine all other times. If I send one more 
> request for the remaining container to RM after it failed to give them the 
> first time(and before releasing already acquired ones), it could allocate 
> that container. I am running only one application at a time, but 1000s of 
> them one after another.
> My main worry is, even though the RM's log is saying that all 10 requested 
> containers are allocated,  the getAllocatedContainers() method is not 
> returning me all of them, it returned only 9 surprisingly. I never saw this 
> kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
> Thanks,
> Kishore
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-05-03 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648300#comment-13648300
 ] 

Krishna Kishore Bonagiri commented on YARN-541:
---

Hi Hitesh,

  I am very curious to know if you could reproduce and resolve this issue.

Thanks,
Kishore


On Tue, Apr 16, 2013 at 1:35 PM, Krishna Kishore Bonagiri <



> getAllocatedContainers() is not returning all the allocated containers
> --
>
> Key: YARN-541
> URL: https://issues.apache.org/jira/browse/YARN-541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.3-alpha
> Environment: Redhat Linux 64-bit
>Reporter: Krishna Kishore Bonagiri
> Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
> yarn-dsadm-resourcemanager-isredeng.out
>
>
> I am running an application that was written and working well with the 
> hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
> getAllocatedContainers() method called on AMResponse is not returning all the 
> containers allocated sometimes. For example, I request for 10 containers and 
> this method gives me only 9 containers sometimes, and when I looked at the 
> log of Resource Manager, the 10th container is also allocated. It happens 
> only sometimes randomly and works fine all other times. If I send one more 
> request for the remaining container to RM after it failed to give them the 
> first time(and before releasing already acquired ones), it could allocate 
> that container. I am running only one application at a time, but 1000s of 
> them one after another.
> My main worry is, even though the RM's log is saying that all 10 requested 
> containers are allocated,  the getAllocatedContainers() method is not 
> returning me all of them, it returned only 9 surprisingly. I never saw this 
> kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
> Thanks,
> Kishore
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-501) Application Master getting killed randomly reporting excess usage of memory

2013-04-16 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633829#comment-13633829
 ] 

Krishna Kishore Bonagiri commented on YARN-501:
---

No problem Hitesh, I can do that. Can you please tell me how do I set that
log level?

Thanks,
Kishore





> Application Master getting killed randomly reporting excess usage of memory
> ---
>
> Key: YARN-501
> URL: https://issues.apache.org/jira/browse/YARN-501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell, nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Krishna Kishore Bonagiri
>
> I am running a date command using the Distributed Shell example in a loop of 
> 500 times. It ran successfully all the times except one time where it gave 
> the following error.
> 2013-03-22 04:33:25,280 INFO  [main] distributedshell.Client 
> (Client.java:monitorApplication(605)) - Got application report from ASM for, 
> appId=222, clientToken=null, appDiagnostics=Application 
> application_1363938200742_0222 failed 1 times due to AM Container for 
> appattempt_1363938200742_0222_01 exited with  exitCode: 143 due to: 
> Container [pid=21141,containerID=container_1363938200742_0222_01_01] is 
> running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb 
> physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing 
> container.
> Dump of the process-tree for container_1363938200742_0222_01_01 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date
> |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c 
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date 
> 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout
>  
> 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-168) No way to turn off virtual memory limits without turning off physical memory limits

2013-04-16 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632868#comment-13632868
 ] 

Krishna Kishore Bonagiri commented on YARN-168:
---

How can I get this fix if I want it now? When would the next release be? I 
mean, the one having this fix!

> No way to turn off virtual memory limits without turning off physical memory 
> limits
> ---
>
> Key: YARN-168
> URL: https://issues.apache.org/jira/browse/YARN-168
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Harsh J
>
> Asked and reported by a user (Krishna) on ML:
> {quote}
> This is possible to do, but you've hit a bug with the current YARN
> implementation. Ideally you should be able to configure the vmem-pmem
> ratio (or an equivalent config) to be -1, to indicate disabling of
> virtual memory checks completely (and there's indeed checks for this),
> but it seems like we are enforcing the ratio to be at least 1.0 (and
> hence negatives are disallowed).
> You can't workaround by setting the NM's offered resource.mb to -1
> either, as you'll lose out on controlling maximum allocations.
> Please file a YARN bug on JIRA. The code at fault lies under
> ContainersMonitorImpl#init(…).
> On Thu, Oct 18, 2012 at 4:00 PM, Krishna Kishore Bonagiri
>  wrote:
> > Hi,
> >
> >   Is there a way we can ask the YARN RM for not killing a container when it
> > uses excess virtual memory than the maximum it can use as per the
> > specification in the configuration file yarn-site.xml? We can't always
> > estimate the amount of virtual memory needed for our application running on
> > a container, but we don't want to get it killed in a case it exceeds the
> > maximum limit.
> >
> >   Please suggest as to how can we come across this issue.
> >
> > Thanks,
> > Kishore
> {quote}
> Basically, we're doing:
> {code}
> // / Virtual memory configuration //
> float vmemRatio = conf.getFloat(
> YarnConfiguration.NM_VMEM_PMEM_RATIO,
> YarnConfiguration.DEFAULT_NM_VMEM_PMEM_RATIO);
> Preconditions.checkArgument(vmemRatio > 0.99f,
> YarnConfiguration.NM_VMEM_PMEM_RATIO +
> " should be at least 1.0");
> this.maxVmemAllottedForContainers =
>   (long)(vmemRatio * maxPmemAllottedForContainers);
> {code}
> For virtual memory monitoring to be disabled, maxVmemAllottedForContainers 
> has to be -1. For that to be -1, given the above buggy computation, vmemRatio 
> must be -1 or maxPmemAllottedForContainers must be -1.
> If vmemRatio were -1, we fail the precondition check and exit.
> If maxPmemAllottedForContainers, we also end up disabling physical memory 
> monitoring.
> Or perhaps that makes sense - to disable both physical and virtual memory 
> monitoring, but that way your NM becomes infinite in resource grants, I think.
> We need a way to selectively disable kills done via virtual memory 
> monitoring, which is the base request here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (YARN-501) Application Master getting killed randomly reporting excess usage of memory

2013-04-16 Thread Krishna Kishore Bonagiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kishore Bonagiri reopened YARN-501:
---


Added information dated 24th March, and 15th April. Please see above. As 
mentioned, it is not really issue with using more than allocated for the 
container. This issue happens randomly when ran the same application either 
mine, or the Distributed Shell example also, with just a date command.

Thanks,
Kishore

> Application Master getting killed randomly reporting excess usage of memory
> ---
>
> Key: YARN-501
> URL: https://issues.apache.org/jira/browse/YARN-501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell, nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Krishna Kishore Bonagiri
>
> I am running a date command using the Distributed Shell example in a loop of 
> 500 times. It ran successfully all the times except one time where it gave 
> the following error.
> 2013-03-22 04:33:25,280 INFO  [main] distributedshell.Client 
> (Client.java:monitorApplication(605)) - Got application report from ASM for, 
> appId=222, clientToken=null, appDiagnostics=Application 
> application_1363938200742_0222 failed 1 times due to AM Container for 
> appattempt_1363938200742_0222_01 exited with  exitCode: 143 due to: 
> Container [pid=21141,containerID=container_1363938200742_0222_01_01] is 
> running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb 
> physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing 
> container.
> Dump of the process-tree for container_1363938200742_0222_01_01 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date
> |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c 
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date 
> 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout
>  
> 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-04-16 Thread Krishna Kishore Bonagiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kishore Bonagiri updated YARN-541:
--

Attachment: yarn-dsadm-resourcemanager-isredeng.out
yarn-dsadm-nodemanager-isredeng.out
AppMaster.stdout

Hi Hitesh,

  I am attaching the logs for AM, RM, and NM. I have an application being
run in a loop, which requires 5 containers. The 8th run has failed with
this issue of getAllocatedContainers(). The Application Master couldn't get
all the 5 containers it required, the getAllocatedContainers() method
returned only 4. The RM's log is saying that the 5th container is also
allocated thro' the message,

2013-04-16 03:32:54,701 INFO  [ResourceManager Event Processor]
rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(220)) -
container_1366096597608_0008_01_06 Container Transitioned from NEW to
ALLOCATED

In RM's log, you can see that this kind of for the remaining 4 containers
also, i.e. container_1366096597608_0008_01_02 to
container_1366096597608_0008_01_05.

Also, as I said before this issue is seen randomly.

Thanks,
Kishore






> getAllocatedContainers() is not returning all the allocated containers
> --
>
> Key: YARN-541
> URL: https://issues.apache.org/jira/browse/YARN-541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.3-alpha
> Environment: Redhat Linux 64-bit
>Reporter: Krishna Kishore Bonagiri
> Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
> yarn-dsadm-resourcemanager-isredeng.out
>
>
> I am running an application that was written and working well with the 
> hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
> getAllocatedContainers() method called on AMResponse is not returning all the 
> containers allocated sometimes. For example, I request for 10 containers and 
> this method gives me only 9 containers sometimes, and when I looked at the 
> log of Resource Manager, the 10th container is also allocated. It happens 
> only sometimes randomly and works fine all other times. If I send one more 
> request for the remaining container to RM after it failed to give them the 
> first time(and before releasing already acquired ones), it could allocate 
> that container. I am running only one application at a time, but 1000s of 
> them one after another.
> My main worry is, even though the RM's log is saying that all 10 requested 
> containers are allocated,  the getAllocatedContainers() method is not 
> returning me all of them, it returned only 9 surprisingly. I never saw this 
> kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
> Thanks,
> Kishore
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-501) Application Master getting killed randomly reporting excess usage of memory

2013-04-15 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631798#comment-13631798
 ] 

Krishna Kishore Bonagiri commented on YARN-501:
---

What I have observed today is that this error is coming at some regular
intervals of 50 minutes. And at that particular interval of time, I am
seeing the following kind of messages in the node manager's log:  So, I
think being the node manager busy with some other task like this monitoring
is causing the error of virtual memory for AM's container.

2013-04-12 15:51:02,048 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for

container_1365688251527_6643_01_03
2013-04-12 15:51:02,048 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for

container_1365688251527_6642_01_04
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for

container_1365688251527_6641_01_05
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for

container_1365688251527_6640_01_06
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for

container_1365688251527_6524_01_01
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for

container_1365688251527_6525_01_02
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for

container_1365688251527_6525_01_03
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for

container_1365688251527_6525_01_04



On Sun, Mar 24, 2013 at 3:54 PM, Krishna Kishore Bonagiri <



> Application Master getting killed randomly reporting excess usage of memory
> ---
>
> Key: YARN-501
> URL: https://issues.apache.org/jira/browse/YARN-501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell, nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Krishna Kishore Bonagiri
>
> I am running a date command using the Distributed Shell example in a loop of 
> 500 times. It ran successfully all the times except one time where it gave 
> the following error.
> 2013-03-22 04:33:25,280 INFO  [main] distributedshell.Client 
> (Client.java:monitorApplication(605)) - Got application report from ASM for, 
> appId=222, clientToken=null, appDiagnostics=Application 
> application_1363938200742_0222 failed 1 times due to AM Container for 
> appattempt_1363938200742_0222_01 exited with  exitCode: 143 due to: 
> Container [pid=21141,containerID=container_1363938200742_0222_01_01] is 
> running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb 
> physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing 
> container.
> Dump of the process-tree for container_1363938200742_0222_01_01 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date
> |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c 
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date 
> 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout
>  
> 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-04-15 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631564#comment-13631564
 ] 

Krishna Kishore Bonagiri commented on YARN-541:
---

Hi Hitesh,

 Thanks for the reply. I have actually changed my code currently to send
new requests for the remaining of required containers, so I am not seeing
this error. I shall revert my changes and try to reproduce the error and
send you the logs in 1 or 2 days.

Thanks,
Kishore





> getAllocatedContainers() is not returning all the allocated containers
> --
>
> Key: YARN-541
> URL: https://issues.apache.org/jira/browse/YARN-541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.3-alpha
> Environment: Redhat Linux 64-bit
>Reporter: Krishna Kishore Bonagiri
>
> I am running an application that was written and working well with the 
> hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
> getAllocatedContainers() method called on AMResponse is not returning all the 
> containers allocated sometimes. For example, I request for 10 containers and 
> this method gives me only 9 containers sometimes, and when I looked at the 
> log of Resource Manager, the 10th container is also allocated. It happens 
> only sometimes randomly and works fine all other times. If I send one more 
> request for the remaining container to RM after it failed to give them the 
> first time(and before releasing already acquired ones), it could allocate 
> that container. I am running only one application at a time, but 1000s of 
> them one after another.
> My main worry is, even though the RM's log is saying that all 10 requested 
> containers are allocated,  the getAllocatedContainers() method is not 
> returning me all of them, it returned only 9 surprisingly. I never saw this 
> kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
> Thanks,
> Kishore
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-04-04 Thread Krishna Kishore Bonagiri (JIRA)
Krishna Kishore Bonagiri created YARN-541:
-

 Summary: getAllocatedContainers() is not returning all the 
allocated containers
 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri


I am running an application that was written and working well with the 
hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
getAllocatedContainers() method called on AMResponse is not returning all the 
containers allocated sometimes. For example, I request for 10 containers and 
this method gives me only 9 containers sometimes, and when I looked at the log 
of Resource Manager, the 10th container is also allocated. It happens only 
sometimes randomly and works fine all other times. If I send one more request 
for the remaining container to RM after it failed to give them the first 
time(and before releasing already acquired ones), it could allocate that 
container. I am running only one application at a time, but 1000s of them one 
after another.

My main worry is, even though the RM's log is saying that all 10 requested 
containers are allocated,  the getAllocatedContainers() method is not returning 
me all of them, it returned only 9 surprisingly. I never saw this kind of issue 
in the previous version, i.e. hadoop-2.0.0-alpha.

Thanks,
Kishore

 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-501) Application Master getting killed randomly reporting excess usage of memory

2013-03-24 Thread Krishna Kishore Bonagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612057#comment-13612057
 ] 

Krishna Kishore Bonagiri commented on YARN-501:
---

Hi Ravi,

  The problem I am reporting is, why should it give this kind of error very
randomly when it could run fine all other times with the same amount of
memory allocations. I see this error once in 500 times. Also, as I said it
is only the date command that I am running with the Distributed Shell
example jar that comes with the hadoop-2.0.3-alpha. So, I suppose this
should work the same way all the times.

Thanks,
Kishore






> Application Master getting killed randomly reporting excess usage of memory
> ---
>
> Key: YARN-501
> URL: https://issues.apache.org/jira/browse/YARN-501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell, nodemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Krishna Kishore Bonagiri
>
> I am running a date command using the Distributed Shell example in a loop of 
> 500 times. It ran successfully all the times except one time where it gave 
> the following error.
> 2013-03-22 04:33:25,280 INFO  [main] distributedshell.Client 
> (Client.java:monitorApplication(605)) - Got application report from ASM for, 
> appId=222, clientToken=null, appDiagnostics=Application 
> application_1363938200742_0222 failed 1 times due to AM Container for 
> appattempt_1363938200742_0222_01 exited with  exitCode: 143 due to: 
> Container [pid=21141,containerID=container_1363938200742_0222_01_01] is 
> running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb 
> physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing 
> container.
> Dump of the process-tree for container_1363938200742_0222_01_01 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date
> |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c 
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date 
> 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout
>  
> 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-501) Application Master getting killed randomly reporting excess usage of memory

2013-03-22 Thread Krishna Kishore Bonagiri (JIRA)
Krishna Kishore Bonagiri created YARN-501:
-

 Summary: Application Master getting killed randomly reporting 
excess usage of memory
 Key: YARN-501
 URL: https://issues.apache.org/jira/browse/YARN-501
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell, nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Krishna Kishore Bonagiri


I am running a date command using the Distributed Shell example in a loop of 
500 times. It ran successfully all the times except one time where it gave the 
following error.

2013-03-22 04:33:25,280 INFO  [main] distributedshell.Client 
(Client.java:monitorApplication(605)) - Got application report from ASM for, 
appId=222, clientToken=null, appDiagnostics=Application 
application_1363938200742_0222 failed 1 times due to AM Container for 
appattempt_1363938200742_0222_01 exited with  exitCode: 143 due to: 
Container [pid=21141,containerID=container_1363938200742_0222_01_01] is 
running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb physical 
memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing container.
Dump of the process-tree for container_1363938200742_0222_01_01 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 21147 21141 21141 21141 (java) 244 12 532643840 11802 
/home_/dsadm/yarn/jdk//bin/java -Xmx128m 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
--container_memory 10 --num_containers 2 --priority 0 --shell_command date
|- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c 
/home_/dsadm/yarn/jdk//bin/java -Xmx128m 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
--container_memory 10 --num_containers 2 --priority 0 --shell_command date 
1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout
 
2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-492) Too many open files error to launch a container

2013-03-20 Thread Krishna Kishore Bonagiri (JIRA)
Krishna Kishore Bonagiri created YARN-492:
-

 Summary: Too many open files error to launch a container
 Key: YARN-492
 URL: https://issues.apache.org/jira/browse/YARN-492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
 Environment: RedHat Linux
Reporter: Krishna Kishore Bonagiri


I am running a date command with YARN's distributed shell example in a loop of 
1000 times in this way:

yarn jar 
/home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
 org.apache.hadoop.yarn.applications.distributedshell.Client --jar 
/home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
 --shell_command date --num_containers 2


Around 730th time or so, I am getting an error in node manager's log saying 
that it failed to launch container because there are "Too many open files" and 
when I observe through lsof command,I find that there is one instance of this 
kind of file is left for each run of Application Master, and it kept growing as 
I am running it in loop.

node1:44871->node1:50010

Thanks,
Kishore

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira