I think that using "-n 1 -c 4" is better in your case. Concerning the strange behavior, you should take a look at the non-overlapping lists to see what is the repartition of the cores when you have bad performances. if you can send to me the different cpuids list for your different jobs as well as the physical mapping of your node, it would be easier to understand the dispatch made by SLURM and see if something can be explained because of that. The physical dispatch can be obtained using "numactl --hardware" :
[hautreuxm@leaf ~]$ numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 4 8 12 16 20 24 28 node 0 size: 32748 MB node 0 free: 30922 MB node 1 cpus: 1 5 9 13 17 21 25 29 node 1 size: 32768 MB node 1 free: 30642 MB node 2 cpus: 2 6 10 14 18 22 26 30 node 2 size: 32768 MB node 2 free: 30839 MB node 3 cpus: 3 7 11 15 19 23 27 31 node 3 size: 32766 MB node 3 free: 31363 MB node distances: node 0 1 2 3 0: 10 15 15 15 1: 15 10 15 15 2: 15 15 10 15 3: 15 15 15 10 [hautreuxm@leaf ~]$ The CR_CORE_DEFAULT_DIST_BLOCK is interesting as it ensures that cores are allocated by socket first not in a round-robin maner on the available sockets. It could be better for you to have this option set if your applications are not memory bound. Matthieu 2011/10/14 Matteo Guglielmi <matteo.guglie...@epfl.ch>: > Ok, I don't have all those extra parameters set as you do but > here is thing: > > for loop (+) #SBATCH -N 1 (+) #SBATCH -n 4 > > does produce non-overlapping lists but some jobs were nontheless > still running at <= 300% CPU utilization > > for loop (+) #SBATCH -N 1-1 (+) #SBATCH -n 1 (+) #SBATCH -c 4 > > does still produce non-overlapping lists + all the jobs do run > at 400%. > > Was it a wrong jobfile then? > > should I also replicate your config parameters into my slurm.conf? > > > On 10/14/11 15:19, HAUTREUX Matthieu wrote: >> >> Our conf is like : >> >> SelectType=select/cons_res >> >> SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE >> >> TaskPlugin=task/affinity >> TaskPluginParam=Cpusets,Cores >> >> You should be able to read the Cpus_allowed_list value has soon has your >> job are started and see if it contains a coherent value (a list of 4 >> integers per job). >> >> Matthieu >> >> Matteo Guglielmi a écrit : >>> >>> I believe so: >>> >>> SelectType=select/cons_res >>> SelectTypeParameters=CR_Core_Memory >>> >>> Running the fast loop tests now... >>> >>> On 10/14/11 14:38, HAUTREUX Matthieu wrote: >>>> >>>> Have you configured task/affinity to do a core binding by default ? >>>> >>>> Can you try a modified version of your script like the following give me >>>> the output for each of your job : >>>> >>>> ### jobfile ### >>>> #SBATCH -n 4 >>>> #SBATCH -N 1 >>>> >>>> export OMP_NUM_THREADS=4 >>>> >>>> cat /proc/self/status | grep Cpus_allowed_list >>>> mpc --L=32 --out=./data --dt=0.05 ...etc >>>> ############### >>>> >>>> You should have only 4 cores associated to each job, and each list of >>>> cores should be different. If you do not have configured the default >>>> binding, you will certainly have the same complete list of cores >>>> available to each job. >>>> >>>> Matthieu >>>> >>>> Matteo Guglielmi a écrit : >>>>> >>>>> Lets say you got a full dollar! >>>>> >>>>> Yes, I'm using task/affinity and not task/cgroup.... >>>>> >>>>> Should I use task/cgroup then? >>>>> >>>>> On 10/14/11 13:55, HAUTREUX Matthieu wrote: >>>>>> >>>>>> Dear Matteo, >>>>>> >>>>>> are you using the task/affinity (or task/cgroup) plugin on your >>>>>> system ? >>>>>> >>>>>> The only way to ensure that your jobs have exclusive access to their >>>>>> allocated resources is to do that. Indeed, select/cons_res reserve a >>>>>> part of the cores to each of your job but do not guarantee that >>>>>> each job >>>>>> will only be able to use the associated set of cores. This is the role >>>>>> of the task/affinity or task/cgroup (option ConstrainCores=yes in >>>>>> cgroup.conf). In your current scenario, if you are not currently using >>>>>> such a plugin, it could possible that due to memory access >>>>>> optimization >>>>>> in the OpenMP library, applications started on a particular socket, >>>>>> try >>>>>> to stay on that socket. As a result, if more than 4 applications >>>>>> primarily start on a same socket, you will have bad performances >>>>>> due to >>>>>> threads congestion. >>>>>> >>>>>> My 2 cents, >>>>>> Matthieu >>>>>> >>>>>> >>>>>> Matteo Guglielmi a écrit : >>>>>>> >>>>>>> Dar Community, >>>>>>> >>>>>>> I'm facing a problem when I submit a series >>>>>>> of (openmp) jobs using a simple for loop. >>>>>>> >>>>>>> Our (fat)nodes have 4 sockets which host 4 >>>>>>> AMD 6176 SE cpus (12-core per cpu). >>>>>>> >>>>>>> The relevant part of the jobfile is outlined >>>>>>> here below: >>>>>>> >>>>>>> ### jobfile ### >>>>>>> #SBATCH -n 4 >>>>>>> #SBATCH -N 1 >>>>>>> >>>>>>> export OMP_NUM_THREADS=4 >>>>>>> >>>>>>> mpc --L=32 --out=./data --dt=0.05 ...etc >>>>>>> ############### >>>>>>> >>>>>>> The way I submit a series of 12 jobs is: >>>>>>> >>>>>>> for i in {0..11}; do sbatch jobfile; done >>>>>>> >>>>>>> Slurm is configured as follow: >>>>>>> >>>>>>> SelectType=select/cons_res >>>>>>> >>>>>>> As you can see I basically reserve 4 cores >>>>>>> per job; each mpc job will start 4 threads. >>>>>>> >>>>>>> Now, If i submit the 12 jobs "by hand" so >>>>>>> to speak I get what I expect to have namely >>>>>>> 12 jobs running at 400%... perfect. >>>>>>> >>>>>>> But if I submit the 12 jobs via a for cycle >>>>>>> as outlined above I always get 2 or 3 jobs >>>>>>> out of 12 running at 300%. >>>>>>> >>>>>>> To me it seems a racing problem which >>>>>>> ultimately leads to more than one thread >>>>>>> being "assigned" to the very same core. >>>>>>> >>>>>>> Question) >>>>>>> >>>>>>> Can this be possible? >>>>>>> >>>>>>> How to avoid it? >>>>>>> >>>>>>> >>>>>>> Of course inserting a "sleep 0.5" into the >>>>>>> for cycle does fix the problem... but I'm >>>>>>> still worried about what will happen when >>>>>>> hundreds of users will be submitting jobs >>>>>>> at the same time. >>>>>>> >>>>>>> I'm still testing slurm and I'd like to make >>>>>>> sure that this problem will not occur when >>>>>>> I will set it as the default batch system. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> --matt >>>>>> >>>>>> >>>>> >>>> >>>> . >>>> >>> >> >> . >> > >