I find that sometimes that does not answer the question, but when it does not, it will either be in slurmctld.log or the slurmd.log on the node.
-- ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* || \\UTGERS |---------------------*O*--------------------- ||_// Biomedical | Ryan Novosielski - Senior Technologist || \\ and Health | [email protected] - 973/972.0922 (2x0922) || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark `' ________________________________________ From: Mike Johnson [[email protected]] Sent: Friday, February 20, 2015 7:22 AM To: slurm-dev Subject: [slurm-dev] Re: Node going in down state again and again Suprita, You should start by running 'scontrol show node <nodename>' where nodename is the affected nodename If you look for a line starting 'Reason=' it will give you a starting point if it doesn't just tell you outright. Mike On 20 February 2015 at 11:42, <[email protected]<mailto:[email protected]>> wrote: Hi Can someone help me why my compute node is going in down state again and again. I have tried all possible ways. RESTRATED SLURMD ALSO CHECKED NETWORK CONFIGURATION ALSO I am attaching the slurm.conf file. Please help. Regards Suprita The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>
