Hi,

We just had a slurmdbd crash yesterday with the following log.

[2016-02-10T07:00:20.066] error: mysql_query failed: 1030 Got error 28 from
storage engine
select job.job_db_inx, job.id_assoc, job.id_wckey, job.array_task_pending,
job.time_eligible, job.time_start, job.time_end, job.time_suspended,
job.cpus_req, job.id_resv, job.tres_alloc, SUM(step.consumed_energy) from
"perceus-00_job_table" as job left outer join "perceus-00_step_table" as
step on job.job_db_inx=step.job_db_inx and (step.id_step>=0) where
(job.time_eligible && job.time_eligible < 1455116400 && (job.time_end >=
1455112800 || job.time_end = 0)) group by job.job_db_inx order by
job.id_assoc, job.time_eligible

This was on 15.08.6.

We are also seeing a bunch of errors similar to the following.

[2016-02-10T06:00:22.249] error: We have more allocated time than is
possible (108445785192 > 6307200) for cluster perceus-00(1752) from
2016-02-10T05:00:00 - 2016-02-10T06:00:00 tres 2
[2016-02-10T06:00:22.262] error: We have more time than is possible
(6307200+745499+0)(7052699) > 6307200 for cluster perceus-00(1752) from
2016-02-10T05:00:00 - 2016-02-10T06:00:00 tres 2

I see a bug report and it's marked as resolved in 15.08.3 (
http://bugs.schedmd.com/show_bug.cgi?id=2068). How do we fix it?

Thanks,

Yong Qin

Reply via email to