Hi,
I wrote about this a couple of days ago (well, I was using srun but I guess it should be same). Basically the combination of -N with -c doesn't seem work but only if you specify different min-max. I'm wondering why -N 13-14 -c 16 (max cpu) worked for you, on my testing setup this also fails.
Best regards, Filip Skalski Quoting Franco Broi <[email protected]>:
Hi Is this behaviour expected? I have a partition with 14 nodes, each with 16 cpus PartitionName=d3 AllocNodes=ALL AllowGroups=ALL Default=NO DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 MaxCPUsPerNode=UNLIMITED Nodes=delta[43-56] Priority=100 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF State=UP TotalCPUs=224 TotalNodes=14 SelectTypeParameters=N/A DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED $ sinfo |grep d3 d3 up infinite 1 down* delta44 d3 up infinite 13 alloc delta[43,45-56] delta44 died but we can tolerate a dead node so we input jobs with -N 13-14 $ sbatch -N 13-14 -p d3 -c 4 sleep.com sbatch: error: Batch job submission failed: Requested node configuration is not available $ sbatch -N 13-14 -p d3 sleep.com Submitted batch job 12287 $ sbatch -N 13-14 -p d3 -c 2 sleep.com sbatch: error: Batch job submission failed: Requested node configuration is not available $ sbatch -N 13-14 -p d3 -c 16 sleep.com Submitted batch job 12290 $ sbatch -N 14 -p d3 -c 16 sleep.com Submitted batch job 12291 $ sbatch -N 14 -p d3 -c 4 sleep.com Submitted batch job 12292 Plus, it would be nice to be able to specify that you want to use all the nodes in a partition but will tolerate a few dead nodes without having to know how many nodes there actually are, perhaps by specifying -N -2 ie, min nodes is max-2. Cheers,
