Dear Paul,

Carles Fenoy <[email protected]> writes:

> Re: [slurm-dev] Re: Orphaned Jobs 
>
> Dear Paul,
>
> You will have you change the state of the job in the mysql. Is the only way I
> know to deal with this kind of jobs. Just change the end time and state of the
> "running" job. I don't know exactly which state you should set, but search 
> some
> job with state cancelled or node fail and set that number.

I have written a Perl script to identify and correct such problems,
which could be of some use to you.  If you are interested, send me a
mail.

Regards

Loris


> Regards,
> Carles Fenoy
>
>
> On Wed, Jun 5, 2013 at 4:48 PM, Paul Edmon <[email protected]> wrote:
>
>
>     Do you mean the node that hosts the slurmdb? Or the node that runs
>     slurmctld?  Or are you speaking of the nodes on which that job ran?
>     
>     -Paul Edmon-
>     
>     
>     
>     On 06/05/2013 10:45 AM, Sefa Arslan wrote:
>     > if possible, rebooting the workerker node is the fastest solution.
>     >
>     >
>     > On 06/05/2013 05:10 PM, Paul Edmon wrote:
>     >> I have a job which shows up in sacct as Running but does not show up on
>     >> squeue or any other probe of the cluster jobs.  I know this job is long
>     >> dead but sacct is under the impression it is still running. I suspect
>     >> that this is due to me having to rebuild my database while in
>     >> production.  However, I've done this before and hadn't seen this issue
>     >> crop up.  Is there a way to remove this job from sacct? scancel does 
> not
>     >> work on it.
>     >>
>     >> -Paul Edmon-

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to