Re: [slurm-users] Nagios or Other Monitoring Plugins

2018-01-18 Thread Marcin Stolarek
We're using icinga2 storing accounting data in influxdb for grafana dashboards. In terms of monitoring I prefere end-user functionality, so apart from services we also have a plugin that submits a jobs to cluster (to idle nodes, with a few minutes of deadline) the job simply creates files on shared

Re: [slurm-users] Nagios or Other Monitoring Plugins

2018-01-18 Thread Ryan Novosielski
> On Jan 18, 2018, at 4:34 PM, Lachlan Musicman wrote: > > On 19 January 2018 at 07:29, Ryan Novosielski wrote: > Hi all, > > Looked back at the mailing list to see if there was a question about this > already. There was some mention of /using/ Nagios, but no real mention of > specifics. What

Re: [slurm-users] Nagios or Other Monitoring Plugins

2018-01-18 Thread Michael Gutteridge
We're moving to Prometheus for lots of our monitoring functions. We've got nagios and ganglia in place, but Prometheus and Grafana makes a really nice combo for monitoring and alerting. There's even an exporter for Slurm- https://github.com/vpenso/prometheus-slurm-exporter that includes node data

Re: [slurm-users] Nagios or Other Monitoring Plugins

2018-01-18 Thread Lachlan Musicman
On 19 January 2018 at 07:29, Ryan Novosielski wrote: > Hi all, > > Looked back at the mailing list to see if there was a question about this > already. There was some mention of /using/ Nagios, but no real mention of > specifics. What do people monitor with Nagios? We monitor, so far, > slurmctld

[slurm-users] Nagios or Other Monitoring Plugins

2018-01-18 Thread Ryan Novosielski
Hi all, Looked back at the mailing list to see if there was a question about this already. There was some mention of /using/ Nagios, but no real mention of specifics. What do people monitor with Nagios? We monitor, so far, slurmctld, slurmdbd, and MySQL, but there are probably some others. Migh

Re: [slurm-users] howto limit the cpu resource for each user

2018-01-18 Thread Colas Rivière
Hello Arielle, I don't have a full answer, but here is a start: Yes, you first need at least "AccountingStorageEnforce=associations,limits" (and qos is you want to use it) so that the limits you set are enforced (see https://slurm.schedmd.com/resource_limits.html) Then you can set limits fo

[slurm-users] howto limit the cpu resource for each user

2018-01-18 Thread Arielle Willm
Hi, slurm is installed in a minimal configuration for a cluster of 3000cores/170 nodes.We have 4 partitions, one for each type of nodes; each partition is available for all users. We want to prevent each user from taking more than 1000 cores running on up to 50 jobs on all the cluster, and I'

Re: [slurm-users] Slurm and available libraries

2018-01-18 Thread Elisabetta Falivene
So EasyBuild + Lmod seems the best solution. I'll try. :) Thank you all! betta 2018-01-17 17:53 GMT+01:00 Christopher Samuel : > On 18/01/18 03:50, Patrick Goetz wrote: > > Can anyone shed some light on the situation? I'm very surprised that >> a module script isn't just an explicit command that

Re: [slurm-users] More cores than on one node

2018-01-18 Thread Loris Bennett
>Nadav Toledo writes: > >> Nadav Toledo writes: >> >> Hey everyone, >> >> We've just setup a slurm cluster with few nodes each has 16 cores. >> Is it possible to submit a job for 17cores or more? >> If not, is there a workaround? >> >> Thanks in advance, Nadav >> >> >> It should be possible. H