Hello Hardik, Are you sure that this is slurm-related? It might actually be something else i think.
If you want to be sure, check slurm logs and see if the shutdown it is actually initialized by slurm. I believe that there is something else initializing the shutdown. Cheers, On 14 April 2017 at 15:48, Hardik Kothari <[email protected]> wrote: > Dear all, > > I am handling a small cluster in our institute. > > I have noticed when a job uses more resources then available, the system > receives a SIGTEM and then nodes goes in a down state. > > Apr 14 02:01:17 node19 journal: Runtime journal is using 8.0M (max 3.1G, > leaving 4.0G of free 31.3G, current limit 3.1G). > Apr 14 02:01:17 node19 journal: Runtime journal is using 8.0M (max 3.1G, > leaving 4.0G of free 31.3G, current limit 3.1G). > Apr 14 02:01:17 node19 systemd-journald: Received SIGTERM > > I have to put nodes abc in the idle state each time a user crosses this > limit. > Is there a way to handle this problem directly within slurm and which > would avoid a node to go in the down state. > > Thanks, > Hardik > -- [image: clustervision_logo.png] Andrea Del Monaco Internal Engineer Mob: +31 64 166 4003 Skype: delmonaco.andrea [email protected] ClusterVision BV Gyroscoopweg 56 1042 AC Amsterdam The Netherlands Tel: +31 20 407 7550 Fax: +31 84 759 8389 www.clustervision.com
