Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages
to confirm that OOM killed my daemons, but could not find any corresponding
messages there! According to the following link, it looks like if it is a
memory issue, I should see a messages even if OOM is disabled, but I don't
see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single
node one? Also, I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I
increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore


On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
vino...@hortonworks.com> wrote:

> Yes, that is what I suspect. That is why I asked if everything is on a
> single node. If you are running linux, linux OOM killer may be shooting
> things down. When it happens, you will see something like "'killed process"
> in system's syslog.
>
> Thanks,
> +Vinod
>
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
> write2kish...@gmail.com> wrote:
>
> Vinod,
>
>   One more thing I observed is that, my Client which submits Application
> Master one after another continuously also gets killed sometimes. So, it is
> always any of the Java Processes that is getting killed. Does it indicate
> some excessive memory usage by them or something like that, that is causing
> them die? If so, how can we resolve this kind of issue?
>
> Thanks,
> Kishore
>
>
> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
> write2kish...@gmail.com> wrote:
>
>> No, I am running on 2 node cluster.
>>
>>
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>> vino...@hortonworks.com> wrote:
>>
>>> Is all of this on a single node?
>>>
>>>  Thanks,
>>> +Vinod
>>>
>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>> write2kish...@gmail.com> wrote:
>>>
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500
>>> times, and while doing so one of the daemons, node manager, resource
>>> manager, or data node is getting killed (I mean disappearing) at a random
>>> point. I see no information in the corresponding log files. How can I know
>>> why is it happening so?
>>>
>>>  And, one more observation is that, this is happening only when I am
>>> using "*" for node name in the container requests, otherwise when I used a
>>> specific node name, everything is fine.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Reply via email to