Re: [slurm-dev] sbatch and job steps

[email protected] Wed, 14 Sep 2011 08:39:44 -0700

Upgrade to slum v2.2 or higher for more jobs or steps. The steps are running 
within the job's allocation that goes away when the job script ends, so adding 
"wait" to the end of the script would probably be your simplest solution.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.


Yuri D'Elia <[email protected]> wrote:

Hi everyone,

I'm using slurm 2.1.0 on a 6 node cluster. I have a couple of questions about 
sbatch.

We're trying to schedule around 100k jobs in the cluster, and I'm hitting the 
MaxJobCount limit. Is there a way to schedule beyond 65k jbos?

I would also like to group jobs logically, so that I can cancel a single job to 
kill all the related steps. To do that I tried to schedule job steps by running 
sbatch --jobid within another sbatch invocation:

# outer script
cal file | while read x; do
sbatch --jobid $SLURM_JOB_ID realjob.sh $x
done

then running this script directly with sbatch:

sbatch outerscript.sh

This seems to schedule all the jobs correctly as steps under the main job, 
which is nice. But as soon as outerscript.sh finishes, all steps are killed.

This leads me to several more questions:

- by running job steps as shown, can I schedule 100k steps?
- how can I avoid steps being killed when the main script finishes?

Also, I'm curious: can I "wait" for a job or a step, or steps in a script (like 
a "barrier" would)? This would be very helpful for several scrips I'm writing 
(dependencies are not what I'm looking for). Can "sattach" be used for that 
purpose?

Thanks.

Re: [slurm-dev] sbatch and job steps

Reply via email to