[slurm-dev] Slurm-16.05.9-1 can't start a batch script when allocated nodes are in power save mode (Fix included)

Didier GAZEN Wed, 08 Feb 2017 11:30:37 -0800

Hi,

When the following conditions are met :


- submitting a script with sbatch
- allocation done on nodes in power save mode
- backfill scheduler
- no PrologSlurmctld program

then the routine 'launch_job' (job_scheduler.c) is never called causingthe job

to be completed by '_purge_missing_jobs' (job_mgr.c) with the following log
message :

[2017-02-08T16:00:36.272] Batch JobId=214 missing from node 0 (not foundBatchStartTime after startup)[2017-02-08T16:00:36.272] job_complete: JobID=214 State=0x1 NodeCnt=1WTERMSIG 126[2017-02-08T16:00:36.272] job_complete: JobID=214 State=0x1 NodeCnt=1cancelled by node failure


Before being cancelled, the job status appears in squeue as :

- 'Configuring' during the boot process of nodes being resumed frompower save

- 'Running' once the nodes are up (but no script will never be started)

I have done some work to track down the bug:

The routine 'launch_job' is called by several functions in slurmctld :

(1) _start_job (backfill.c) if job's CONFIGURING flag isfalse

(2) _schedule           (job_scheduler.c) if job's CONFIGURING flag is false

(3) prolog_running_decr (job_scheduler.c) in case a PrologSlurmctldprogram is run(4) job_time_limit (job_mgr.c) if the nodes are coming fromREBOOT

It seems that functions (1) or (2) may be called during job submissionbut the

job CONFIGURING flag is true because job is started on allocated nodes that
are in power save mode => launch_job cannot be called. Then later,

periodically, functions (1) and (2) are called but as they are dealingonly with

PENDING jobs, our RUNNING job is avoided => launch_job cannot be called.

The function (3) is called when a PrologSlurmctld program is defined : Idon't

have one => launch_job cannot be called. Note that when a PrologSlurmctld
program is defined, there is no problem.

Finally, the issue can be fixed in the 'job_time_limit' function (4) that is
periodically called for RUNNING jobs. I am just not sure that this is not
breaking the logic for the NODE_REBOOT case but it's working fine :

diff --git a/src/slurmctld/job_mgr.c b/src/slurmctld/job_mgr.c
index 1d961ab..d6463cc 100644
--- a/src/slurmctld/job_mgr.c
+++ b/src/slurmctld/job_mgr.c
@@ -7583,9 +7583,10 @@ void job_time_limit(void)
                        if (job_ptr->bit_flags & NODE_REBOOT) {
                                job_ptr->bit_flags &= (~NODE_REBOOT);
                                job_validate_mem(job_ptr);
-                               if (job_ptr->batch_flag)
-                                       launch_job(job_ptr);
-                       }
+                        }
+                       if (job_ptr->batch_flag){
+                               launch_job(job_ptr);
+                        }
                }
 #endif
                /* This needs to be near the top of the loop, checks every

What do you think?

Best regards,

Didier

[slurm-dev] Slurm-16.05.9-1 can't start a batch script when allocated nodes are in power save mode (Fix included)

Reply via email to