The reason field clearing problem was discovered and fixed last week.
It will be in v2.2.4. Let me know if you want a patch now; it's just a couple 
of lines.
________________________________________
From: [email protected] [[email protected]] On Behalf 
Of Bjørn-Helge Mevik [[email protected]]
Sent: Wednesday, March 16, 2011 6:28 AM
To: [email protected]
Subject: Re: [slurm-dev] Possible bug: scontrol hold not possible when grpcpus 
is set
Since you asked: :-) The fix drew my attention to another small problem:
When the _admin_ user releases a job, the Reason is not cleared, and the
priority is not set back to its normal value (it is set to 1).  It does
become runnable, though.

For instance:

login-0-0 696(1)$ bjob -j 1504
JOBID NAME       USER   ACCOUNT PARTITI QOS    ST     PRIORITY(PRIOR)  TIME 
TIME_LEFT CPU NOD MIN_MEM MIN_TMP NODELIST(REASON)
 1504 1436.30    bhm    staff   lowpri  lowpri PD 0.0000000093(   40)  0:00     
 5:00   1   1     400       0 (Priority)

teflon 520(2)# scontrol hold 1504

login-0-0 697(1)$ bjob -j 1504
JOBID NAME       USER   ACCOUNT PARTITI QOS    ST     PRIORITY(PRIOR)  TIME 
TIME_LEFT CPU NOD MIN_MEM MIN_TMP NODELIST(REASON)
 1504 1436.30    bhm    staff   lowpri  lowpri PD 0.0000000000(    0)  0:00     
 5:00   1   1     400       0 (JobHeldAdmin)

teflon 521(2)# scontrol release 1504

login-0-0 698(1)$ bjob -j 1504
JOBID NAME       USER   ACCOUNT PARTITI QOS    ST     PRIORITY(PRIOR)  TIME 
TIME_LEFT CPU NOD MIN_MEM MIN_TMP NODELIST(REASON)
 1504 1436.30    bhm    staff   lowpri  lowpri PD 0.0000000002(    1)  0:00     
 5:00   1   1     400       0 (JobHeldAdmin)
login-0-0 699(1)$

The job is indeed runnable: A little later:

login-0-0 700(1)$ bjob -j 1504
JOBID NAME       USER   ACCOUNT PARTITI QOS    ST     PRIORITY(PRIOR)  TIME 
TIME_LEFT CPU NOD MIN_MEM MIN_TMP NODELIST(REASON)
 1504 1436.30    bhm    staff   lowpri  lowpri  R 0.0000000002(    1)  0:19     
 4:41   1   1     400       0 compute-0-9


The same happens with scontrol uhold, or when the admin user releases a
job held by the job owner.

I've verified that this also happens with an unpatched 2.2.3, so the
patch did not introduce the problem.


--
Bjørn-Helge Mevik, dr. scient,
Research Computing Services, University of Oslo


Reply via email to