[email protected] (Bjørn-Helge Mevik) writes: > We think we've discovered an error in slurm 2.2.1. When we do a > scontrol reconfig (or kill -hup the slurmctld), the usage->grp_used_cpus > field of qos'es and associations gets set to its previous value + the > current actual usage. (If we restart slurm, it gets set to the current > actual usage.)
[...] > We can reproduce it on an un-patched slurm 2.2.1 like this (this is for > QoS limits, but we see the same behaviour for account limits): A small correction to myself: we have seen this only for _QOSes_, not for _accounts_. I guess I got a little confused when looking at the logs (with the current debug level, we get about 1 GB per day. :-) If I were to guess what happens, I'd suggest the problem is that _restore_job_dependencies() calls assoc_mgr_clear_used_info() which clears association limits, but it doesn't cleare the qos limits. -- Bjørn-Helge Mevik, dr. scient, Research Computing Services, University of Oslo
