What happens if you use srun instead of mpirun?
On October 15, 2014 5:31:42 AM PDT, Edrisse Chermak <[email protected]> wrote: > >My mistake, I forgot some important lscpu NUMA output : > >NUMA node0 CPU(s): 0,4,8,12,16,20,24,28 >NUMA node1 CPU(s): 32,36,40,44,48,52,56,60 >NUMA node2 CPU(s): 1,5,9,13,17,21,25,29 >NUMA node3 CPU(s): 33,37,41,45,49,53,57,61 >NUMA node4 CPU(s): 2,6,10,14,18,22,26,30 >NUMA node5 CPU(s): 34,38,42,46,50,54,58,62 >NUMA node6 CPU(s): 35,39,43,47,51,55,59,63 >NUMA node7 CPU(s): 3,7,11,15,19,23,27,31 > >Thanks in advance, >Edrisse > >On 10/15/2014 03:24 PM, Edrisse Chermak wrote: >> Dear Slurm Developers and Users, >> >> I would like to constrain an 8 cpu job to run in one socket of 16 >cpu, >> with one task per core. >> Unfortunately, when using the script : >> --- >> sbatch -J $JOB -N 1 -B '1:8:1' --ntasks-per-socket=8 >> --ntasks-per-core=1 << eof >> ... >> mpirun -np 8 nwchem_64to32 $JOB.nwc >& $JOB.out >> ... >> eof >> --- >> top command on compute node shows 2 tasks running on the same core : >> --- >> $ top >> 11838 11846 51 edrisse 20 0 12.3g 9452 95m R 46.7 0.0 0:01.43 >> nwchem_64to32 >> 11838 11845 59 edrisse 20 0 12.3g 9600 96m R 46.4 0.0 0:01.42 >> nwchem_64to32 >> 11838 11844 47 edrisse 20 0 12.3g 9592 95m R 46.4 0.0 0:01.42 >> nwchem_64to32 >> 11838 11843 43 edrisse 20 0 12.3g 9844 96m R 46.4 0.0 0:01.42 >> nwchem_64to32 >> 11838 11842 3 edrisse 20 0 12.3g 9.8m 96m R 46.4 0.0 0:01.43 >> nwchem_64to32 >> 11838 11841 35 edrisse 20 0 12.3g 9.8m 92m R 45.7 0.0 0:01.41 >> nwchem_64to32 >> 11838 11840 39 edrisse 20 0 12.3g 10m 96m R 46.1 0.0 0:01.42 >> nwchem_64to32 >> 11838 11839 55 edrisse 20 0 12.3g 10m 109m R 46.4 0.0 0:01.42 >> nwchem_64to32 >> --- >> Unfortunately, cpu 55 and cpu 51 own to the same core in our node's >> architecture: (see NUMA node7) >> --- >> $ lscpu >> CPU(s): 64 >> On-line CPU(s) list: 0-63 >> Thread(s) per core: 2 >> Core(s) per socket: 8 >> Socket(s): 4 >> NUMA node(s): 8 >> ... >> NUMA node0 CPU(s): 0,4,8,12,16,20,24,28 >> ... >> NUMA node7 CPU(s): 3,7,11,15,19,23,27,31 >> --- >> I perhaps missed something, if you could guide me to the right option >> it would be great. >> I also attached my slurm.conf file. >> >> Best Regards, >> Edrisse > >________________________________ > >This message and its contents including attachments are intended solely >for the original recipient. If you are not the intended recipient or >have received this message in error, please notify me immediately and >delete this message from your computer system. Any unauthorized use or >distribution is prohibited. Please consider the environment before >printing this email.
