Mike, I’m working through your suggestions. I tried
$ salloc –ntasks=20 --cpus-per-task=24 --verbose myscript.bash but salloc says that the resources are not available: salloc: defined options salloc: -------------------- -------------------- salloc: cpus-per-task : 24 salloc: ntasks : 20 salloc: verbose : 1 salloc: -------------------- -------------------- salloc: end of defined options salloc: Linear node selection plugin loaded with argument 4 salloc: select/cons_res loaded with argument 4 salloc: Cray/Aries node selection plugin loaded salloc: select/cons_tres loaded with argument 4 salloc: Granted job allocation 34299 srun: error: Unable to create step for job 34299: Requested node configuration is not available $ scontrol show nodes /* oddly says that there is one core per socket. could our nodes be misconfigured? */ NodeName=n020 Arch=x86_64 CoresPerSocket=1 CPUAlloc=0 CPUTot=24 CPULoad=0.00 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=n020 NodeHostName=n020 Version=20.02.3 OS=Linux 4.18.0-305.7.1.el8_4.x86_64 #1 SMP Mon Jun 14 17:25:42 EDT 2021 RealMemory=1 AllocMem=0 FreeMem=126431 Sockets=24 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=normal,low,high BootTime=2021-11-18T08:43:44 SlurmdStartTime=2021-11-18T08:44:31 CfgTRES=cpu=24,mem=1M,billing=24 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s From: slurm-users <[email protected]> On Behalf Of Renfro, Michael Sent: Friday, November 26, 2021 8:15 AM To: Slurm User Community List <[email protected]> Subject: [EXTERNAL] Re: [slurm-users] Reserving cores without immediately launching tasks on all of them The end of the MPICH section at [1] shows an example using salloc [2]. Worst case, you should be able to use the output of “scontrol show hostnames” [3] and use that data to make mpiexec command parameters to run one rank per node, similar to what’s shown at the end of the synopsis section of [4]. [1] https://slurm.schedmd.com/mpi_guide.html#mpich2<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fmpi_guide.html%23mpich2&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7Ce6f6860268d745f9bde108d9b0e992ea%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637735339520105658%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=awAJ7NLxanv3WsH0h9O%2BA5zemiBPbGfQZ9PZfPRux%2Bk%3D&reserved=0> [2] https://slurm.schedmd.com/salloc.html<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fsalloc.html&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7Ce6f6860268d745f9bde108d9b0e992ea%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637735339520115614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DN2HFdOgRQD2PysTTpxwuyAvue%2FsNXR%2F2Is%2BDGiNoZ4%3D&reserved=0> [3] https://slurm.schedmd.com/scontrol.html<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fscontrol.html&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7Ce6f6860268d745f9bde108d9b0e992ea%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637735339520115614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=FwouWVlL08O2kgUidxL9MLJQZJ7g5frTYccTlwmX6O4%3D&reserved=0> [4] https://www.mpich.org/static/docs/v3.1/www1/mpiexec.html<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mpich.org%2Fstatic%2Fdocs%2Fv3.1%2Fwww1%2Fmpiexec.html&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7Ce6f6860268d745f9bde108d9b0e992ea%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637735339520125570%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3sPjiFGuDEGGbgFMwj2jqUTyzMFVpsURrQyH9Z%2B0yWs%3D&reserved=0> -- Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services 931 372-3601<tel:931%20372-3601> / Tennessee Tech University On Nov 25, 2021, at 12:45 PM, Mccall, Kurt E. (MSFC-EV41) <[email protected]<mailto:[email protected]>> wrote: External Email Warning This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests. ________________________________ I want to launch an MPICH job with sbatch with one task per node (each a manager), while also reserving a certain number of cores on each node for the managers to fill up with spawned workers (via MPI_Comm_spawn). I’d like to avoid using –exclusive. I tried the arguments –ntasks=20 –cpus-per-task=24, but it appears that 20 * 24 tasks will be launched. Is there a way to reserve cores without immediately launching tasks on them? Thanks for any help. sbatch: defined options sbatch: -------------------- -------------------- sbatch: cpus-per-task : 24 sbatch: ignore-pbs : set sbatch: ntasks : 20 sbatch: test-only : set sbatch: verbose : 1 sbatch: -------------------- -------------------- sbatch: end of defined options sbatch: Linear node selection plugin loaded with argument 4 sbatch: select/cons_res loaded with argument 4 sbatch: Cray/Aries node selection plugin loaded sbatch: select/cons_tres loaded with argument 4 sbatch: Job 34274 to start at 2021-11-25T12:15:05 using 480 processors on nodes n[001-020] in partition normal
