Hi Aaron, Thanks for the advice. Indeed, SLURM correctly recognizes the core and socket count. However, when I allow to share jobs on the node, it allocates the processors in a random fashion across the machine, and we do not see any scaling. When I run with --mem_bind=sockets, it runs all the job starting from the first socket, without recognizing that it can be occupied already. This means that we need to provide the socket number explicitly per srun to achieve the desired behavior. Instead, we would like to have node-per-socket system and let SLURM decide which one is free.
If there was a workaround without doing this, we would readily accept it too. Kind regards, Artem On 3 June 2012 18:49, Aaron Knister <[email protected]> wrote: > Hi Artem, > > Granted I have never used SLURM on a large NUMA system but I would think > that having one node is fine as long as SLURM properly recognizes your > socket and core count. If you want to allocate sockets not cores then I > would suggest using the consumable resources select plugin with either > CR_Socket or CR_Socket_Memory for the value of SelectTypeParameter. > Hopefully some other folks will chime in :) > > -Aaron > > Sent from my iPhone > > On Jun 3, 2012, at 12:20 PM, Artem Kulachenko <[email protected]> > wrote: > > Hi Everyone, > > We have one middle size SMP machine with NUMA architectures having 20 > sockets and 160 cores. We would like to use SLURM, which currently sees it > as a single node. I wonder if there is a way to split it into 20 nodes (one > per socket) in order to run the jobs locally at neighboring cores using > local memory without the need to assign the processors manually. > > I will very appreciate any advice. > > Kind regards, > Artem > > >
