Dear Paul, Carles Fenoy <[email protected]> writes:
> Re: [slurm-dev] Re: Orphaned Jobs > > Dear Paul, > > You will have you change the state of the job in the mysql. Is the only way I > know to deal with this kind of jobs. Just change the end time and state of the > "running" job. I don't know exactly which state you should set, but search > some > job with state cancelled or node fail and set that number. I have written a Perl script to identify and correct such problems, which could be of some use to you. If you are interested, send me a mail. Regards Loris > Regards, > Carles Fenoy > > > On Wed, Jun 5, 2013 at 4:48 PM, Paul Edmon <[email protected]> wrote: > > > Do you mean the node that hosts the slurmdb? Or the node that runs > slurmctld? Or are you speaking of the nodes on which that job ran? > > -Paul Edmon- > > > > On 06/05/2013 10:45 AM, Sefa Arslan wrote: > > if possible, rebooting the workerker node is the fastest solution. > > > > > > On 06/05/2013 05:10 PM, Paul Edmon wrote: > >> I have a job which shows up in sacct as Running but does not show up on > >> squeue or any other probe of the cluster jobs. I know this job is long > >> dead but sacct is under the impression it is still running. I suspect > >> that this is due to me having to rebuild my database while in > >> production. However, I've done this before and hadn't seen this issue > >> crop up. Is there a way to remove this job from sacct? scancel does > not > >> work on it. > >> > >> -Paul Edmon- -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email [email protected]
