I'm not sure what the problem is but running this as root or SlurmUser  
for a while may help diagnose the problems:
"scontrol setdebugflags +CPU_Bind"

To disable:
"scontrol setdebugflags -CPU_Bind"

Look at the SlurmctldLogFile for details about CPU allocations and binding:
slurmctld: step_id:421.0
slurmctld: JobNode[0] Socket[0] Core[0] is allocated
slurmctld: JobNode[0] Socket[0] Core[1] is allocated
slurmctld: JobNode[0] Socket[0] Core[2] is allocated
slurmctld: JobNode[0] Socket[0] Core[3] is allocated
slurmctld: JobNode[0] Socket[0] Core[4] is allocated
slurmctld: JobNode[0] Socket[0] Core[5] is allocated


Quoting Felip Moll <[email protected]>:

> Hello SLURM list!
>
> I currently have a small cluster with 1+15 nodes with Slurm 2.4.3 running
> fine.
>
> I found a "problem" with cpu masks that may be is something that I do wrong.
>
> My 2-socket nodes have quad cores on it, but I limited the node to use only
> 4 cores in total (memory bandwith problems, front side buses).
>
> When I send 4 tasks to Slurm them go to processor0, to cores 0,2,4,6 as
> reported by "top". That's ok.
> When I send 1 task from user 1 to Slurm, it goes to processor0, to core 0.
> Then I send 3 more tasks from another user, and them go to proc. 2,4,6
> instead of 1,3,5 or 7. So it does use the same socket.
> If I do a  map_cpu=1,3,5 it also goes to 2,4,6.
> If there is nobody in the node and I do map_cpu=1,3,5, it goes to 1,3 and 5.
>
> Sometimes it takes into account my map_cpu, and sometimes not. I guess
> there's some problem here.
>
> Also the --hint=memory_bounded takes no effect and does the same than
> --hint=compute_bounded , goes to 0,2,4,6.
>
> I am using CR_Core_Memory. Doing "lstopo" shows the 0,2,4,6 cores are at
> socket 0, and 1,3,5,7 at socket 1.
>
>
> I couldn't find the exact explanation for this. What am I missing?
>
>
> Thank you all from Barcelona,
>
> Felip
>

Reply via email to