[slurm-dev] Re: Duplicate jobs in the grid_job_table

Chris Read Tue, 12 Mar 2013 10:52:06 -0700

Update:

I've just cleaned things up by deleting the duplicates where state = 0
(PENDING). The correct state for the job is actually 7 (NODE_FAIL), not
CANCELLED as I stated above.


No need to restart slurmdbd either...

Chris


On Tue, Mar 12, 2013 at 4:28 PM, Chris Read <[email protected]> wrote:

> Forgot to add another question:
>
> What's the correct way to clean this up? Just delete the record showing
> PENDING and restart slurmdbd?
>
>
> On Tue, Mar 12, 2013 at 4:26 PM, Chris Read <[email protected]> wrote:
>
>> Greetings...
>>
>> Just stumbled across some strange behaviour on the accounting side of
>> things: we have a collection of jobs that have duplicate records in the
>> grid_job_table. The visible symptoms of this are that sacct shows the jobs
>> as still pending when they are not.
>>
>> In all of the cases I can find:
>>
>> - there is no information available in the slurmdbd.log
>> - the slurmctld.log shows the jobs have been canceled
>> - the job_db_inx for the entry in PENDING state is > the job_db_inx for
>> the entry in CANCELLED
>>
>> We're currently on 2.5.1.
>>
>> Anyone have any idea how these got there?
>>
>> Chris
>>
>
>

[slurm-dev] Re: Duplicate jobs in the grid_job_table

Reply via email to