Upgrade to slum v2.2 or higher for more jobs or steps. The steps are running within the job's allocation that goes away when the job script ends, so adding "wait" to the end of the script would probably be your simplest solution. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Yuri D'Elia <[email protected]> wrote: Hi everyone, I'm using slurm 2.1.0 on a 6 node cluster. I have a couple of questions about sbatch. We're trying to schedule around 100k jobs in the cluster, and I'm hitting the MaxJobCount limit. Is there a way to schedule beyond 65k jbos? I would also like to group jobs logically, so that I can cancel a single job to kill all the related steps. To do that I tried to schedule job steps by running sbatch --jobid within another sbatch invocation: # outer script cal file | while read x; do sbatch --jobid $SLURM_JOB_ID realjob.sh $x done then running this script directly with sbatch: sbatch outerscript.sh This seems to schedule all the jobs correctly as steps under the main job, which is nice. But as soon as outerscript.sh finishes, all steps are killed. This leads me to several more questions: - by running job steps as shown, can I schedule 100k steps? - how can I avoid steps being killed when the main script finishes? Also, I'm curious: can I "wait" for a job or a step, or steps in a script (like a "barrier" would)? This would be very helpful for several scrips I'm writing (dependencies are not what I'm looking for). Can "sattach" be used for that purpose? Thanks.
