[email protected] (Bjørn-Helge Mevik) writes:

> We think we've discovered an error in slurm 2.2.1.  When we do a
> scontrol reconfig (or kill -hup the slurmctld), the usage->grp_used_cpus
> field of qos'es and associations gets set to its previous value + the
> current actual usage.  (If we restart slurm, it gets set to the current
> actual usage.)

[...]

> We can reproduce it on an un-patched slurm 2.2.1 like this (this is for
> QoS limits, but we see the same behaviour for account limits):

A small correction to myself: we have seen this only for _QOSes_, not
for _accounts_.  I guess I got a little confused when looking at the
logs (with the current debug level, we get about 1 GB per day. :-)

If I were to guess what happens, I'd suggest the problem is that
_restore_job_dependencies() calls assoc_mgr_clear_used_info() which
clears association limits, but it doesn't cleare the qos limits.

-- 
Bjørn-Helge Mevik, dr. scient,
Research Computing Services, University of Oslo

Reply via email to