Update: I've just cleaned things up by deleting the duplicates where state = 0 (PENDING). The correct state for the job is actually 7 (NODE_FAIL), not CANCELLED as I stated above.
No need to restart slurmdbd either... Chris On Tue, Mar 12, 2013 at 4:28 PM, Chris Read <[email protected]> wrote: > Forgot to add another question: > > What's the correct way to clean this up? Just delete the record showing > PENDING and restart slurmdbd? > > > On Tue, Mar 12, 2013 at 4:26 PM, Chris Read <[email protected]> wrote: > >> Greetings... >> >> Just stumbled across some strange behaviour on the accounting side of >> things: we have a collection of jobs that have duplicate records in the >> grid_job_table. The visible symptoms of this are that sacct shows the jobs >> as still pending when they are not. >> >> In all of the cases I can find: >> >> - there is no information available in the slurmdbd.log >> - the slurmctld.log shows the jobs have been canceled >> - the job_db_inx for the entry in PENDING state is > the job_db_inx for >> the entry in CANCELLED >> >> We're currently on 2.5.1. >> >> Anyone have any idea how these got there? >> >> Chris >> > >
