Hello,

We upgraded to 15.08.1 yesterday.  I see these messages in 
/var/log/slurm/slurmctld.log frequently now:

[2015-10-01T08:09:35.461] error: _handle_qos_tres_run_secs: job 2556644: QOS 
override TRES cpu grp_used_tres_run_secs underflow, tried to remove 19200 
seconds when only 0 remained.
[2015-10-01T08:09:35.461] error: _handle_qos_tres_run_secs: job 2556644: QOS 
override TRES mem grp_used_tres_run_secs underflow, tried to remove 76800000 
seconds when only 0 remained.
[2015-10-01T08:09:35.461] error: _handle_qos_tres_run_secs: job 2690887: QOS 
sahl TRES cpu grp_used_tres_run_secs underflow, tried to remove 1200 seconds 
when only 0 remained.
[2015-10-01T08:09:35.461] error: _handle_qos_tres_run_secs: job 2691478: QOS 
trilling TRES cpu grp_used_tres_run_secs underflow, tried to remove 1200 
seconds when only 0 remained.
[2015-10-01T08:09:35.461] error: _handle_qos_tres_run_secs: job 2691478: QOS 
trilling TRES mem grp_used_tres_run_secs underflow, tried to remove 48000000 
seconds when only 0 remained.
[2015-10-01T08:09:35.461] error: _handle_qos_tres_run_secs: job 2691478: QOS 
trilling TRES node grp_used_tres_run_secs underflow, tried to remove 300 
seconds when only 0 remained.

To me it sounds harmless, like there is some race condition in tracking of the 
tres cpu/mem seconds in use.  I thought I’d mention it anyway as no one likes 
errors in their logs! :)

We were using GrpCPURunMins to limit resource use to accounts, that seems to be 
handled by GrpTRESRunMin now.

Let me know if you need more info!

Chris
 
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167




Reply via email to