Hi Lucas The jobs in the array are rather organized under the one submission. For example, I launched a job array yesterday and can query them all with the one job ID from the original launch, e.g., 42:
$ sacct -j 51 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 51_39 testjob.s+ debug default 4 COMPLETED 0:0 51_39.batch batch default 2 COMPLETED 0:0 51_38 testjob.s+ debug default 4 COMPLETED 0:0 51_38.batch batch default 2 COMPLETED 0:0 $ sacct -j 51_38 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 51_38 testjob.s+ debug default 4 COMPLETED 0:0 51_38.batch batch default 2 COMPLETED 0:0 So, yes, they are also considered separate jobs, and can be queried as such, but Slurm does put them under the one umbrella of the launching job. In terms of capping how many tasks will start on a node, or in terms of over-committing, those can be managed as per the options for them, e.g., --tasks-per-node, and srun --overcommit. But I can't think of a way to have some of the tasks start before all the requested resources are available, apart from using a job array. For non-array jobs, Slurm must guarantee all resources requested before launching, the way I understand it. As noted, -m for cyclic distribution will not affect this individual startup of tasks, only in how they are distributed across nodes when the requested resources are ready. Others might have better ideas though.. Regarding the workflow of your overall computations, there was this parametric modelling toolkit that I used to support (at Monash University) called Nimrod (Nimrod/O, Nimrod/G, Nimrod/K, Nimrod/E, etc), but I never handled its support for Slurm, although there was already a prototype for that maybe three years ago. It was designed to drive several different types of resource managers. One of the Nimrod tools, Nimrod/O, conducted the iterative launch of computations in order to find the optimum (possibly multiple local ones), guided by a library of different techniques. I can pass this along to the group that still supports this if you'd like. Regards Jeff -- Jeff Tan Infrastructure Services & Technologies IBM Research - Australia From: "Koziol, Lucas" <lucas.koz...@exxonmobil.com> To: "slurm-dev" <slurm-dev@schedmd.com> Date: 05/01/2017 03:26 Subject: [slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm Does a job array launch separate slurm processes? Lucas From: Jeff Tan [mailto:jeffe...@au1.ibm.com] Sent: Tuesday, January 03, 2017 7:34 PM To: slurm-dev <slurm-dev@schedmd.com> Subject: [slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm "Koziol, Lucas" <lucas.koz...@exxonmobil.com> wrote on 04/01/2017 10:28:14: > I want to have 1 batch script, where I reserve a certain large > number of CPUs, and then run multiple 1-CPU tasks from within this > single script. The reason being that I do several cycles of these > tasks, and I need to process the outputs between tasks. Will you be programming the in-between processing (between cycles) into the batch script as well? Just curious. In an earlier email you wrote: > What I want to do is run a large number of single-CPU tasks, and have them > distributed evenly over all allocated nodes, and to oversubscribe CPUs to > tasks (each task is very light on CPU resources). You may need to set --tasks-per-node=16, and set --nodes as required. When you say oversubscribing CPUs, do you mean to use --overcommit with srun? .. > The hope was that all 16 tasks would run on Node 1, and 16 tasks would run > on Node 2. Unfortunately what happens is that all 32 jobs get assigned to > Node 1. I thought –m cyclic was supposed to avoid this. Scheduling tasks independently of one another might only be possible with job arrays. Slurm would normally wait for all resources to be lined up before it starts the job otherwise. Also, as I understand it, -m (--distribution) does not change Slurm's behavior to line up all the CPUs required in total before starting the job (unless it's a job array). Regards Jeff -- Jeff Tan Infrastructure Services & Technologies IBM Research - Australia