Re: [slurm-dev] sbatch and job steps

Yuri D'Elia Thu, 15 Sep 2011 11:19:12 -0700

On Thu, 15 Sep 2011 18:11:27 +0200
Carles Fenoy <[email protected]> wrote:


> I would suggest using srun inside the job, and not sbatch.
> 
> If you submit a job with sbatch and inside use srun you can easily use
> dependencies between stages.
> 
> So you will have
> 
> $ sbatch --job-name=stage1 --ntasks=6 ./script_stage1.sh
> Submitted batch job 1
> 
> script_stage1.sh:
> 
> srun -n1 process_data.exe 1 &
> srun -n1 process_data.exe 2 &
> srun -n1 process_data.exe 3 &
> srun -n1 process_data.exe 4 &
> ...
> # You should control the number of srun running at the same time
> 
> wait

Well, yes, but that's very kludgy.

Before trying with SLURM (and torque, if that matters) I was using first a set 
of scripts, which basically were managing subjobs manually in the shell, 
waiting for instances to finish, etc. It kinda-worked, but scheduling was far 
from optimal, it was not scalable and requires too much planning between users.

Then I switched to a script which queued jobs using "pexec" using its 
hypervisor mode. Better scheduling, but still requires too much planning.

I want to be able to dynamically scale the partition. Also, I want job 
scheduling to be as simple as "queuing-command ./job". Handling parallellism in 
a job itself like you propose is a step backward in my opinion.

I mean, the easiest way would be

$ sbatch ./script

./script

while ....; do
  for x in $(seq 1 Y); do
    srun ... &
  done
  wait
done

but this does not take advantage of the full parallism unless multiple 
instances of the same script are run at the same time (the wait here will 
create usage gaps). Not to mention that you have to provision for Y steps, 
where Y needs to be determined manually.

> That's the way I'll do it, but don't know if there is any other way to do
> it.
> You should also have a control of which steps in every stage have finished
> in order to be able to resume the execution in case of any node failure.

The way I see it, I would like a "swait" command. This would allow me to queue 
steps, then wait for these steps to finish.

sbatch stage1.sh

# stage1.sh
for ....; do
  sbatch --jobid $SLURM_JOB_ID stage1-step.sh
done
swait $SLURM_JOB_ID
#####

When "swait" is invoked inside the same job id as given on the command like, it 
should simply wait for all steps to finish, without counting the id of the 
allocation. Better yet, if SLURM_JOB_ID is defined, it should use it directly. 
That way "swait" would map *perfectly* to the "wait" built-in.

This way I could also implement super-easily my steps:

sbatch multi-stage.sh

# multi-stage.sh
for ...; do
  sbatch --jobid $SLURM_JOB_ID stage1-step.sh
done
swait

for ...; do
  sbatch --jobid $SLURM_JOB_ID stage2-step.sh
done
swait

echo "finished"
#####

There! The dependency problem is resolved without using dependencies in the 
first place. Also, managing the queue becomes *much* easier.

I'm actually prepared to implement it (I'm unsure if I could use the API or 
have to change the sources of the slurmctld for this to work efficiently).

Does anybody have any suggestion/idea/comment on this? Is the idea sound?
Any recommendation as where to start?

Thanks again.

Re: [slurm-dev] sbatch and job steps

Reply via email to