Hi!I have noticed strange behaviour in the task/affinity plugin if I use --cpu_bind=socket and -c > 1.
My task are distributed one on each socket (I have 8) and if I say -c 6 six of my sockets are allocated to my first task. If I have 8 tasks each task get 6 of the 8 sockets.
This sounds like a bad behaviour but is might be as design?I have traced it down to the lllp_distribution() function in task/affinity/dist_task.c
In this switch statement:
switch (req->task_dist) {
case SLURM_DIST_BLOCK_BLOCK:
case SLURM_DIST_CYCLIC_BLOCK:
case SLURM_DIST_PLANE:
/* tasks are distributed in blocks within a plane */
rc = _task_layout_lllp_block(req, node_id, &masks);
break;
case SLURM_DIST_CYCLIC:
case SLURM_DIST_BLOCK:
case SLURM_DIST_CYCLIC_CYCLIC:
case SLURM_DIST_BLOCK_CYCLIC:
rc = _task_layout_lllp_cyclic(req, node_id, &masks);
break;
default:
if (req->cpus_per_task > 1)
rc = _task_layout_lllp_multi(req, node_id, &masks);
else
rc = _task_layout_lllp_cyclic(req, node_id, &masks);
req->task_dist = SLURM_DIST_BLOCK_CYCLIC;
break;
}
in the default block there is a diffrent function called if
cpus_per_task > 1. Should the cyclic block be the same as the default block?
Or should SLURM_DIST_CYCLIC, SLURM_DIST_BLOCK be the same as default? Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, UmeƄ Universitet
smime.p7s
Description: S/MIME Cryptographic Signature
