Knowing the version of Slurm and any logs from the slurmctld on any of the jobs 
would be helpful.  You could have a script that periodically polls the 
slurmctld and slurmdbd to see if there are jobs hanging around erroneously.  I 
am unaware of any issues like this in current code though.  But that isn't 
saying there aren't any.

Alejandro Lucero Palau <[email protected]> wrote:

>
>There have been some race conditions in the past that led to accounting
>errors like this. I do not know if your case come from that time or it
>is something new.
>
>I wonder if there are race conditions hidden and hard to detect, but
>could we have some automatic action for detecting such a problem?
>
>The quick solution would be to have that check with some local scripts.
>The good one would be slurmdbd and slurmctld taking the effort.
>
>What do you think Danny?
>
>On 01/30/2013 05:30 PM, Lipari, Don wrote:
>> Matteo,
>>
>> I suspect something happened that prevented the state change for your
>jobs below from being propagated to the database.  You are going to
>have to modify them manually in mysql.  Change the state for these jobs
>to 3 (JOB_COMPLETE).  I would also update the time_end field to
>something greater than time_start and less than now, though this second
>step may not be necessary for sacctmgr to remove the account.
>>
>> Don
>>
>>   
>>> From: Guglielmi Matteo [mailto:[email protected]] 
>>> Sent: Tuesday, January 29, 2013 2:48 PM
>>> To: slurm-dev
>>> Subject: [slurm-dev] Ghost JobID in Database?
>>>     
>>   
>>> I want to remove an unused account but the sacctmgr command returns
>6 ghost
>>> JobID which are not running anywhere on the system (note the 6 lower
>JobID
>>> values).
>>>
>>> How can I purge these ghost jobs?
>>>
>>> ###
>>>
>>> sacctmgr remove account guest
>>>
>>>  Error with request: Job(s) active, cancel job(s) before remove
>>>   JobID = 594676     C = superb     A = guestlp    U = bruijn    P =
>batch
>>>   JobID = 594680     C = superb     A = guestlp    U = bruijn    P =
>batch
>>>   JobID = 594686     C = superb     A = guestlp    U = bruijn    P =
>batch
>>>   JobID = 594699     C = superb     A = guestlp    U = bruijn    P =
>batch
>>>   JobID = 594703     C = superb     A = guestlp    U = bruijn    P =
>batch
>>>   JobID = 594707     C = superb     A = guestlp    U = bruijn    P =
>batch
>>>   JobID = 3563965    C = superb     A = guestlp    U = bruijn    P =
>batch
>>>     
>>
>>   
>
>
>WARNING / LEGAL TEXT: This message is intended only for the use of the
>individual or entity to which it is addressed and may contain
>information which is privileged, confidential, proprietary, or exempt
>from disclosure under applicable law. If you are not the intended
>recipient or the person responsible for delivering the message to the
>intended recipient, you are strictly prohibited from disclosing,
>distributing, copying, or in any way using this message. If you have
>received this communication in error, please notify the sender and
>destroy and delete any copies you may have received.
>
>http://www.bsc.es/disclaimer

Reply via email to