Hi!

I have noticed strange behaviour in the task/affinity plugin if I use --cpu_bind=socket and -c > 1.

My task are distributed one on each socket (I have 8) and if I say -c 6 six of my sockets are allocated to my first task. If I have 8 tasks each task get 6 of the 8 sockets.

This sounds like a bad behaviour but is might be as design?

I have traced it down to the lllp_distribution() function in task/affinity/dist_task.c

In this switch statement:

        switch (req->task_dist) {
        case SLURM_DIST_BLOCK_BLOCK:
        case SLURM_DIST_CYCLIC_BLOCK:
        case SLURM_DIST_PLANE:
                /* tasks are distributed in blocks within a plane */
                rc = _task_layout_lllp_block(req, node_id, &masks);
                break;
        case SLURM_DIST_CYCLIC:
        case SLURM_DIST_BLOCK:
        case SLURM_DIST_CYCLIC_CYCLIC:
        case SLURM_DIST_BLOCK_CYCLIC:
                rc = _task_layout_lllp_cyclic(req, node_id, &masks);
                break;
        default:
                if (req->cpus_per_task > 1)
                        rc = _task_layout_lllp_multi(req, node_id, &masks);
                else
                        rc = _task_layout_lllp_cyclic(req, node_id, &masks);
                req->task_dist = SLURM_DIST_BLOCK_CYCLIC;
                break;
        }

in the default block there is a diffrent function called if cpus_per_task > 1. Should the cyclic block be the same as the default block?

Or should SLURM_DIST_CYCLIC, SLURM_DIST_BLOCK be the same as default?

Best regards,
Magnus

--
Magnus Jonsson, Developer, HPC2N, UmeƄ Universitet

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to