I'm not sure what the problem is but running this as root or SlurmUser for a while may help diagnose the problems: "scontrol setdebugflags +CPU_Bind"
To disable: "scontrol setdebugflags -CPU_Bind" Look at the SlurmctldLogFile for details about CPU allocations and binding: slurmctld: step_id:421.0 slurmctld: JobNode[0] Socket[0] Core[0] is allocated slurmctld: JobNode[0] Socket[0] Core[1] is allocated slurmctld: JobNode[0] Socket[0] Core[2] is allocated slurmctld: JobNode[0] Socket[0] Core[3] is allocated slurmctld: JobNode[0] Socket[0] Core[4] is allocated slurmctld: JobNode[0] Socket[0] Core[5] is allocated Quoting Felip Moll <[email protected]>: > Hello SLURM list! > > I currently have a small cluster with 1+15 nodes with Slurm 2.4.3 running > fine. > > I found a "problem" with cpu masks that may be is something that I do wrong. > > My 2-socket nodes have quad cores on it, but I limited the node to use only > 4 cores in total (memory bandwith problems, front side buses). > > When I send 4 tasks to Slurm them go to processor0, to cores 0,2,4,6 as > reported by "top". That's ok. > When I send 1 task from user 1 to Slurm, it goes to processor0, to core 0. > Then I send 3 more tasks from another user, and them go to proc. 2,4,6 > instead of 1,3,5 or 7. So it does use the same socket. > If I do a map_cpu=1,3,5 it also goes to 2,4,6. > If there is nobody in the node and I do map_cpu=1,3,5, it goes to 1,3 and 5. > > Sometimes it takes into account my map_cpu, and sometimes not. I guess > there's some problem here. > > Also the --hint=memory_bounded takes no effect and does the same than > --hint=compute_bounded , goes to 0,2,4,6. > > I am using CR_Core_Memory. Doing "lstopo" shows the 0,2,4,6 cores are at > socket 0, and 1,3,5,7 at socket 1. > > > I couldn't find the exact explanation for this. What am I missing? > > > Thank you all from Barcelona, > > Felip >
