Hi Jeff, I have run the resource manager in the foreground without nohup and here are the messages when it was killed, it says it is "Killed" but doesn't say why!
13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application appattempt_1387266015651_0258_000001 released container container_1387266015651_0258_01_000003 on node: host: isredeng:36576 #containers=2 available=7936 used=256 with event: FINISHED 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl: container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED to RUNNING Killed Thanks, Kishore On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <stuck...@umd.edu> wrote: > What if you open the daemons in a "screen" session rather than running > them in the background -- for example, run "yarn resourcemanager". Then you > can see exactly when they terminate, and hopefully why. > > *From: *Krishna Kishore Bonagiri > *Sent: *Monday, December 16, 2013 6:20 AM > *To: *user@hadoop.apache.org > *Reply To: *user@hadoop.apache.org > *Subject: *Re: Yarn -- one of the daemons getting killed > > Hi Vinod, > > Yes, I am running on Linux. > > I was actually searching for a corresponding message in /var/log/messages > to confirm that OOM killed my daemons, but could not find any corresponding > messages there! According to the following link, it looks like if it is a > memory issue, I should see a messages even if OOM is disabled, but I don't > see it. > > http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html > > And, is memory consumption more in case of two node cluster than a > single node one? Also, I see this problem only when I give "*" as the node > name. > > One other thing I suspected was the allowed number of user processes, > I increased that to 31000 from 1024 but that also didn't help. > > Thanks, > Kishore > > > On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli < > vino...@hortonworks.com> wrote: > >> Yes, that is what I suspect. That is why I asked if everything is on a >> single node. If you are running linux, linux OOM killer may be shooting >> things down. When it happens, you will see something like "'killed process" >> in system's syslog. >> >> Thanks, >> +Vinod >> >> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri < >> write2kish...@gmail.com> wrote: >> >> Vinod, >> >> One more thing I observed is that, my Client which submits Application >> Master one after another continuously also gets killed sometimes. So, it is >> always any of the Java Processes that is getting killed. Does it indicate >> some excessive memory usage by them or something like that, that is causing >> them die? If so, how can we resolve this kind of issue? >> >> Thanks, >> Kishore >> >> >> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri < >> write2kish...@gmail.com> wrote: >> >>> No, I am running on 2 node cluster. >>> >>> >>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli < >>> vino...@hortonworks.com> wrote: >>> >>>> Is all of this on a single node? >>>> >>>> Thanks, >>>> +Vinod >>>> >>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri < >>>> write2kish...@gmail.com> wrote: >>>> >>>> Hi, >>>> I am running a small application on YARN (2.2.0) in a loop of 500 >>>> times, and while doing so one of the daemons, node manager, resource >>>> manager, or data node is getting killed (I mean disappearing) at a random >>>> point. I see no information in the corresponding log files. How can I know >>>> why is it happening so? >>>> >>>> And, one more observation is that, this is happening only when I am >>>> using "*" for node name in the container requests, otherwise when I used a >>>> specific node name, everything is fine. >>>> >>>> Thanks, >>>> Kishore >>>> >>>> >>>> >>>> CONFIDENTIALITY NOTICE >>>> NOTICE: This message is intended for the use of the individual or >>>> entity to which it is addressed and may contain information that is >>>> confidential, privileged and exempt from disclosure under applicable law. >>>> If the reader of this message is not the intended recipient, you are hereby >>>> notified that any printing, copying, dissemination, distribution, >>>> disclosure or forwarding of this communication is strictly prohibited. If >>>> you have received this communication in error, please contact the sender >>>> immediately and delete it from your system. Thank You. >>> >>> >>> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> > >