Hi there, After an upgrade from 14.03.08 to 14.03.10, we noticed that our jobs didn't complete according to sacct. Squeue no longer showed them, as they had completed. After some investigation, it seems like it's the DBD's that are rejecting the jobs with this error:
slurmdbd[10475]: error: as_mysql_step_complete: Not inputing this job, it has no submit time. However, when we query it with sacct, it does have a submit time. Starting slurmdbd with -D -vvv gave us the following, slurmdbd: error: We have more allocated time than is possible (17193870 > 15206400) for cluster calculon(4224) from 2014-11-27T10:00:00 - 2014-11-27T11:00:00 slurmdbd: error: We have more time than is possible (15206400+15163934+0)(30370334) > 15206400 for cluster calculon(4224) from 2014-11-27T10:00:00 - 2014-11-27T11:00:00 slurmdbd: debug2: No need to roll cluster calculon this day 1417042800 <= 1417042800 slurmdbd: debug2: No need to roll cluster calculon this month 1414796400 <= 1414796400 slurmdbd: debug2: Got 1 rolled up slurmdbd: debug2: Everything rolled up slurmdbd: debug2: DBD_JOB_START: ELIGIBLE CALL ID:1270167 NAME:order_handler_serverfront slurmdbd: debug2: as_mysql_slurmdb_job_start() called slurmdbd: debug2: DBD_JOB_START: ELIGIBLE CALL ID:1270168 NAME:order_handler_serverfront slurmdbd: debug2: as_mysql_slurmdb_job_start() called slurmdbd: debug2: DBD_STEP_COMPLETE: ID:0.0 SUBMIT:0 slurmdbd: error: as_mysql_step_complete: Not inputing this job, it has no submit time. slurmdbd: debug2: DBD_STEP_COMPLETE: ID:0.0 SUBMIT:0 slurmdbd: error: as_mysql_step_complete: Not inputing this job, it has no submit time. Searching for this error yielded very little information. The dbd's were upgraded first, then the controllers, and lastly the compute nodes. After a few days with this, we tried downgrading to 14.03.08, but now we get the exact same behavior there. I thought perhaps this was due to state files (or similar) being "tainted" by the new version. Are there any options in slurm.conf that can lead to this situation? I would appreciate any pointers, I've ran out of ideas. Also, I believe the error message ought to say "inputting"? Wbr Andreas -------------------------------------------------------------------------- Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful.
