Hi all,

I am having the weirdest error ever.  I am pretty sure this is a bug. I
have reproduced the error in latest slurm commit (slurm 17.02.0-0pre2,
 commit 406d3fe429ef6b694f30e19f69acf989e65d7509 ) and in slurm 16.05.5
branch. It does NOT happen in slurm 15.08.12 .

My cluster is composed by 8 nodes, each with 2 sockets, each with 8 cores.
Slurm.conf content is

SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear  #DEDICATED NODES
NodeName=acme[11-14,21-24] CPUs=16 Sockets=2 CoresPerSocket=8
ThreadsPerCore=1 State=UNKNOWN

I am running a simple hello World parallel code. It is submitted as "sbatch
--ntasks=X --tasks-per-node=Y myScript.sh ". The problem is that, depending
on the values of X and Y, Slurm performs a wrong opperation and returns an
error.

"
sbatch --ntasks=8 --tasks-per-node=2 myScript.sh
srun: Warning: can't honor --ntasks-per-node set to 2 which doesn't match
the requested tasks 4 with the number of requested nodes 4. Ignoring
--ntasks-per-node.
"
Note that  I did not request 4 but 8 tasks, and I did not request any
number of nodes.  Same happens with
"
sbatch --ntasks=16 --tasks-per-node=2 myScript.sh
srun: Warning: can't honor --ntasks-per-node set to 2 which doesn't match
the requested tasks 8 with the number of requested nodes 8. Ignoring
--ntasks-per-node.
"
and
"
sbatch --ntasks=32 --tasks-per-node=4 myScript.sh
srun: Warning: can't honor --ntasks-per-node set to 4 which doesn't match
the requested tasks 8 with the number of requested nodes 8. Ignoring
--ntasks-per-node.
"
All the rest of configurations work correctly and do not return any error.
In particular, I have tried the following combinations with no problem:
(ntasks, tasks-per-node)
(1,1)
(2,1), (2,2)
(4,1), (4,2), (4,4)
(8,1), (4,4), (8,8)
(16,4), (16,8), (16,16)
(32,8), (32,16)
(64,8), (64, 16)
(128, 16)

As said, this does not happen when executing the very same commands and
scripts with slurm 15.08.12. So, have you had any similar experiences? Is
this a bug, a desired behaviour, or am I doing something wrong?

Thanks for your help.

Best regards,



Manuel

Reply via email to