I looked at the code more closely this morning.  What I discovered is that 
a reconfigure invokes logic to clear the "group"  usage counts 
(grp_used_cpus, grp_used_nodes, grp_used_cpu_run_secs, grp_used_jobs, and 
grp_used_submit_jobs) for a designated QOS, but does nothing to clear the 
"user" counts.  In fact, submit_jobs is the only QOS "user" count being 
adjusted during the reconfigure.  None of the other QOS "user" counts 
(maxcpus, maxjobs, maxnodes) are being adjusted by the reconfigure.  I 
verified this by instrumenting the logic that increments & decrements the 
counts.  I have therefore written a simple new function, of the name 
_clear_qos_job_submit_info,  which is invoked within the 
_clear_used_qos_info function located in the module src/common/acct_mgr.c. 
 This new function clears the submit_job count for each "user" found in 
the QOS usage->user_limit_list.  This is isolated logic so I feel very 
confident it will not introduce any regressions.  Attached is a copy of 
the patch for 2.3.0.
Best Regards,
Bill


Attachment: assoc_mgr.c.patch
Description: Binary data

Reply via email to