Hi, We have seen sacct reports some jobs with with time_start = time_end, so elapsed time is reported as 0, but these jobs have steps with execution time > 0 (time_end - time_start for those steps).
Peering at slurmdbd log we can see "DBD_JOB_START: ELIGIBLE CALL ID" for those jobs creating an entry in job_table with time_start=0. However, there are not any log for "DBD_JOB_START: START CALL ID" for those jobs as there are such log lines for other jobs without this problem. Studying the code related with this accouting process, we have found a potential race condition between _set_db_inx_thread thread at plugins/accounting_storage/slurmdbd/accounting_storage_slurmdbd.c and call to jobacct_storage_g_job_start in select_nodes function at slurmctl/node_scheduler.c The problem is the call from select_nodes is done when the job is set to RUNNING but just if a db_index does exist. The _set_db_inx_thread thread is executed each 5 seconds, and it sends DBD JOB START messages to slurmdbd. When slurmdbd processes those messages, it looks at which is the job state for sending either a ELEGIBLE CALL ID or START CALL ID. Most of the times the code works fine but there's a possibility where a race conditions appears: 1) _set_db_inx_thread sends a DBD JOB START message to slurmdbd 2) slurmdbd looks at job state so it sends a ELIGIGLE CALL ID to mysql 3) Before mysql backend can process the operation, this is before a db_index for the job can be created, select_nodes sets the job to RUNNING but it DOES not call jobacct_storage_g_job_start since there is no a db_index for the job. 4) A step starts for the job 5) When the job completes an update in job_table for that job sets the right completion time but it leaves start time to 0. Those clusters with thousand of jobs by day are probably suffering this problem. This situation has been studied for slurm 2.2.7 but same problems arise with slurm 2.3.1: select id_job,job_db_inx,id_user,from_unixtime(time_submit),from_unixtime(time_start),from_unixtime(time_end), state from bsc_cluster_job_table t1 where time_start=0 and time_end > 0 and exists (select * from bsc_cluster_step_table t2 where t1.job_db_inx=t2.job_db_inx); First thought was to solve the problem updating the job table entry with start time at completion time but it would lead to sreport not showing consumed time till the job completes. Other option could be to check when first step is created then update the job table entry if job start time is 0. Probably someone else can see a better way to solve it or other implications around. WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received. http://www.bsc.es/disclaimer.htm
