If you could send me the script that would be appreciated. Thanks. -Paul Edmon-
On 06/06/2013 02:25 AM, Loris Bennett wrote: > Dear Paul, > > Carles Fenoy <[email protected]> writes: > >> Re: [slurm-dev] Re: Orphaned Jobs >> >> Dear Paul, >> >> You will have you change the state of the job in the mysql. Is the only way I >> know to deal with this kind of jobs. Just change the end time and state of >> the >> "running" job. I don't know exactly which state you should set, but search >> some >> job with state cancelled or node fail and set that number. > I have written a Perl script to identify and correct such problems, > which could be of some use to you. If you are interested, send me a > mail. > > Regards > > Loris > > >> Regards, >> Carles Fenoy >> >> >> On Wed, Jun 5, 2013 at 4:48 PM, Paul Edmon <[email protected]> wrote: >> >> >> Do you mean the node that hosts the slurmdb? Or the node that runs >> slurmctld? Or are you speaking of the nodes on which that job ran? >> >> -Paul Edmon- >> >> >> >> On 06/05/2013 10:45 AM, Sefa Arslan wrote: >> > if possible, rebooting the workerker node is the fastest solution. >> > >> > >> > On 06/05/2013 05:10 PM, Paul Edmon wrote: >> >> I have a job which shows up in sacct as Running but does not show up >> on >> >> squeue or any other probe of the cluster jobs. I know this job is >> long >> >> dead but sacct is under the impression it is still running. I suspect >> >> that this is due to me having to rebuild my database while in >> >> production. However, I've done this before and hadn't seen this >> issue >> >> crop up. Is there a way to remove this job from sacct? scancel does >> not >> >> work on it. >> >> >> >> -Paul Edmon-
