Hi Andriy,

This one line patch seems to fix the problem and has been applied to the version 2.4 code.

diff --git a/src/srun/srun.c b/src/srun/srun.c
index cadbca1..ae31f92 100644
--- a/src/srun/srun.c
+++ b/src/srun/srun.c
@@ -1142,7 +1142,6 @@ _terminate_job_step(slurm_step_ctx_t *step_ctx)
        slurm_step_ctx_get(step_ctx, SLURM_STEP_CTX_JOBID, &job_id);
        slurm_step_ctx_get(step_ctx, SLURM_STEP_CTX_STEPID, &step_id);
        info("Terminating job step %u.%u", job_id, step_id);
-       update_job_state(job, SRUN_JOB_CANCELLED);
        slurm_kill_job_step(job_id, step_id, SIGKILL);
 }

Quoting "Andrej N. Gritsenko" <and...@rep.kiev.ua>:

    Hello!

    Our customers complaining about strange behavior when job started via
srun was ended with non-zero exit code:

======================================================================
$ srun sh -c "exit 2"
srun: error: node4-131-23: task 0: Exited with exit code 2
srun: Terminating job step 2602.0
slurmd[node4-131-23]: *** STEP 2602.0 KILLED AT 2011-06-20T18:41:48 WITH SIGNAL 9 ***

$ scontrol show jobs 2602
JobId=2602 Name=sh
   UserId=user1-1(510) GroupId=user1-1(510)
   Priority=100983 Account=group1 QOS=
   JobState=CANCELLED Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=0 ExitCode=0:0
======================================================================

That seems as if user just cancelled the job and it happens only when
slurm.conf have the statement KillOnBadExit=1. With KillOnBadExit=0
everything is plain:

======================================================================
JobId=2604 Name=sh
   UserId=user1-1(510) GroupId=user1-1(510)
   Priority=983 Account=group1 QOS=
   JobState=FAILED Reason=NonZeroExitCode Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=0 ExitCode=2:0
======================================================================

When a job is started via sbatch then everything is plain with any value
of KillOnBadExit parameter. Is it a bug or just undocumented feature?

    With best wishes.
    Andriy.




Reply via email to