If you could send me the script that would be appreciated.  Thanks.

-Paul Edmon-

On 06/06/2013 02:25 AM, Loris Bennett wrote:
> Dear Paul,
>
> Carles Fenoy <[email protected]> writes:
>
>> Re: [slurm-dev] Re: Orphaned Jobs
>>
>> Dear Paul,
>>
>> You will have you change the state of the job in the mysql. Is the only way I
>> know to deal with this kind of jobs. Just change the end time and state of 
>> the
>> "running" job. I don't know exactly which state you should set, but search 
>> some
>> job with state cancelled or node fail and set that number.
> I have written a Perl script to identify and correct such problems,
> which could be of some use to you.  If you are interested, send me a
> mail.
>
> Regards
>
> Loris
>
>
>> Regards,
>> Carles Fenoy
>>
>>
>> On Wed, Jun 5, 2013 at 4:48 PM, Paul Edmon <[email protected]> wrote:
>>
>>
>>      Do you mean the node that hosts the slurmdb? Or the node that runs
>>      slurmctld?  Or are you speaking of the nodes on which that job ran?
>>      
>>      -Paul Edmon-
>>      
>>      
>>      
>>      On 06/05/2013 10:45 AM, Sefa Arslan wrote:
>>      > if possible, rebooting the workerker node is the fastest solution.
>>      >
>>      >
>>      > On 06/05/2013 05:10 PM, Paul Edmon wrote:
>>      >> I have a job which shows up in sacct as Running but does not show up 
>> on
>>      >> squeue or any other probe of the cluster jobs.  I know this job is 
>> long
>>      >> dead but sacct is under the impression it is still running. I suspect
>>      >> that this is due to me having to rebuild my database while in
>>      >> production.  However, I've done this before and hadn't seen this 
>> issue
>>      >> crop up.  Is there a way to remove this job from sacct? scancel does 
>> not
>>      >> work on it.
>>      >>
>>      >> -Paul Edmon-

Reply via email to