Hello, i can shrink a running job by running
$ scontrol update JobId=1234 NumNodes=4 If a job had 8 nodes allocated it is correctly shrinked to 4 nodes now, however the job steps on the the 4 nodes that are removed from the job are immediately killed. Is there a way to leave the job steps running to finish but not schedule more steps to respect the new number of nodes? I tried to run the job steps with no kill option "-k" but that did not change the behaviour. Alternatively is it possible to automatically reschedule the killed job steps ? This is a sample batch file that is submitted with sbatch: ------------------------------------------------ #!/bin/bash #SBATCH --output=/NAS/renderfarm/jobs/job.%J.out #SBATCH --output=/NAS/renderfarm/jobs/job.%J.out #SBATCH -p render #SBATCH --nodes=1-8 #SBATCH --job-name="Some Job" #SBATCH --mem=12000 #SBATCH --time=01:00:00 SRUNOPTS="--chdir=/tmp -l -k -c1 -n1 -N1 --checkpoint-dir=/NAS/checkpoints" srun_cr $SRUNOPTS --job-name="Step 1" longlastingjob & srun_cr $SRUNOPTS --job-name="Step 2" longlastingjob & srun_cr $SRUNOPTS --job-name="Step 3" longlastingjob & srun_cr $SRUNOPTS --job-name="Step 4" longlastingjob & srun_cr $SRUNOPTS --job-name="Step 5" longlastingjob & srun_cr $SRUNOPTS --job-name="Step 6" longlastingjob & srun_cr $SRUNOPTS --job-name="Step 7" longlastingjob & srun_cr $SRUNOPTS --job-name="Step 8" longlastingjob & wait ------------------------------------------------ Thanks, Lutz
