The reason field clearing problem was discovered and fixed last week. It will be in v2.2.4. Let me know if you want a patch now; it's just a couple of lines. ________________________________________ From: [email protected] [[email protected]] On Behalf Of Bjørn-Helge Mevik [[email protected]] Sent: Wednesday, March 16, 2011 6:28 AM To: [email protected] Subject: Re: [slurm-dev] Possible bug: scontrol hold not possible when grpcpus is set Since you asked: :-) The fix drew my attention to another small problem: When the _admin_ user releases a job, the Reason is not cleared, and the priority is not set back to its normal value (it is set to 1). It does become runnable, though.
For instance: login-0-0 696(1)$ bjob -j 1504 JOBID NAME USER ACCOUNT PARTITI QOS ST PRIORITY(PRIOR) TIME TIME_LEFT CPU NOD MIN_MEM MIN_TMP NODELIST(REASON) 1504 1436.30 bhm staff lowpri lowpri PD 0.0000000093( 40) 0:00 5:00 1 1 400 0 (Priority) teflon 520(2)# scontrol hold 1504 login-0-0 697(1)$ bjob -j 1504 JOBID NAME USER ACCOUNT PARTITI QOS ST PRIORITY(PRIOR) TIME TIME_LEFT CPU NOD MIN_MEM MIN_TMP NODELIST(REASON) 1504 1436.30 bhm staff lowpri lowpri PD 0.0000000000( 0) 0:00 5:00 1 1 400 0 (JobHeldAdmin) teflon 521(2)# scontrol release 1504 login-0-0 698(1)$ bjob -j 1504 JOBID NAME USER ACCOUNT PARTITI QOS ST PRIORITY(PRIOR) TIME TIME_LEFT CPU NOD MIN_MEM MIN_TMP NODELIST(REASON) 1504 1436.30 bhm staff lowpri lowpri PD 0.0000000002( 1) 0:00 5:00 1 1 400 0 (JobHeldAdmin) login-0-0 699(1)$ The job is indeed runnable: A little later: login-0-0 700(1)$ bjob -j 1504 JOBID NAME USER ACCOUNT PARTITI QOS ST PRIORITY(PRIOR) TIME TIME_LEFT CPU NOD MIN_MEM MIN_TMP NODELIST(REASON) 1504 1436.30 bhm staff lowpri lowpri R 0.0000000002( 1) 0:19 4:41 1 1 400 0 compute-0-9 The same happens with scontrol uhold, or when the admin user releases a job held by the job owner. I've verified that this also happens with an unpatched 2.2.3, so the patch did not introduce the problem. -- Bjørn-Helge Mevik, dr. scient, Research Computing Services, University of Oslo
