Knowing the version of Slurm and any logs from the slurmctld on any of the jobs would be helpful. You could have a script that periodically polls the slurmctld and slurmdbd to see if there are jobs hanging around erroneously. I am unaware of any issues like this in current code though. But that isn't saying there aren't any.
Alejandro Lucero Palau <[email protected]> wrote: > >There have been some race conditions in the past that led to accounting >errors like this. I do not know if your case come from that time or it >is something new. > >I wonder if there are race conditions hidden and hard to detect, but >could we have some automatic action for detecting such a problem? > >The quick solution would be to have that check with some local scripts. >The good one would be slurmdbd and slurmctld taking the effort. > >What do you think Danny? > >On 01/30/2013 05:30 PM, Lipari, Don wrote: >> Matteo, >> >> I suspect something happened that prevented the state change for your >jobs below from being propagated to the database. You are going to >have to modify them manually in mysql. Change the state for these jobs >to 3 (JOB_COMPLETE). I would also update the time_end field to >something greater than time_start and less than now, though this second >step may not be necessary for sacctmgr to remove the account. >> >> Don >> >> >>> From: Guglielmi Matteo [mailto:[email protected]] >>> Sent: Tuesday, January 29, 2013 2:48 PM >>> To: slurm-dev >>> Subject: [slurm-dev] Ghost JobID in Database? >>> >> >>> I want to remove an unused account but the sacctmgr command returns >6 ghost >>> JobID which are not running anywhere on the system (note the 6 lower >JobID >>> values). >>> >>> How can I purge these ghost jobs? >>> >>> ### >>> >>> sacctmgr remove account guest >>> >>> Error with request: Job(s) active, cancel job(s) before remove >>> JobID = 594676 C = superb A = guestlp U = bruijn P = >batch >>> JobID = 594680 C = superb A = guestlp U = bruijn P = >batch >>> JobID = 594686 C = superb A = guestlp U = bruijn P = >batch >>> JobID = 594699 C = superb A = guestlp U = bruijn P = >batch >>> JobID = 594703 C = superb A = guestlp U = bruijn P = >batch >>> JobID = 594707 C = superb A = guestlp U = bruijn P = >batch >>> JobID = 3563965 C = superb A = guestlp U = bruijn P = >batch >>> >> >> > > >WARNING / LEGAL TEXT: This message is intended only for the use of the >individual or entity to which it is addressed and may contain >information which is privileged, confidential, proprietary, or exempt >from disclosure under applicable law. If you are not the intended >recipient or the person responsible for delivering the message to the >intended recipient, you are strictly prohibited from disclosing, >distributing, copying, or in any way using this message. If you have >received this communication in error, please notify the sender and >destroy and delete any copies you may have received. > >http://www.bsc.es/disclaimer
