Re: [slurm-dev] Fast Job Submission Via For Loop

Matthieu Hautreux Fri, 14 Oct 2011 08:51:43 -0700

I think that using "-n 1 -c 4" is better in your case.

Concerning the strange behavior, you should take a look at the
non-overlapping lists to see what is the repartition of the cores when
you have bad performances.
if you can send to me the different cpuids list for your different
jobs as well as the physical mapping of your node, it would be easier
to understand the dispatch made by SLURM and see if something can be
explained because of that. The physical dispatch can be obtained using
"numactl --hardware" :


[hautreuxm@leaf ~]$ numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 4 8 12 16 20 24 28
node 0 size: 32748 MB
node 0 free: 30922 MB
node 1 cpus: 1 5 9 13 17 21 25 29
node 1 size: 32768 MB
node 1 free: 30642 MB
node 2 cpus: 2 6 10 14 18 22 26 30
node 2 size: 32768 MB
node 2 free: 30839 MB
node 3 cpus: 3 7 11 15 19 23 27 31
node 3 size: 32766 MB
node 3 free: 31363 MB
node distances:
node   0   1   2   3
  0:  10  15  15  15
  1:  15  10  15  15
  2:  15  15  10  15
  3:  15  15  15  10
[hautreuxm@leaf ~]$


The CR_CORE_DEFAULT_DIST_BLOCK is interesting as it ensures that cores
are allocated by socket first not in a round-robin maner on the
available sockets.
It could be better for you to have this option set if your
applications are not memory bound.


Matthieu

2011/10/14 Matteo Guglielmi <matteo.guglie...@epfl.ch>:
> Ok, I don't have all those extra parameters set as you do but
> here is thing:
>
> for loop (+) #SBATCH -N 1   (+) #SBATCH -n 4
>
> does produce non-overlapping lists but some jobs were nontheless
> still running at <= 300% CPU utilization
>
> for loop (+) #SBATCH -N 1-1 (+) #SBATCH -n 1 (+) #SBATCH -c 4
>
> does still produce non-overlapping lists + all the jobs do run
> at 400%.
>
> Was it a wrong jobfile then?
>
> should I also replicate your config parameters into my slurm.conf?
>
>
> On 10/14/11 15:19, HAUTREUX Matthieu wrote:
>>
>> Our conf is like :
>>
>> SelectType=select/cons_res
>>
>> SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE
>>
>> TaskPlugin=task/affinity
>> TaskPluginParam=Cpusets,Cores
>>
>> You should be able to read the Cpus_allowed_list value has soon has your
>> job are started and see if it contains a coherent value (a list of 4
>> integers per job).
>>
>> Matthieu
>>
>> Matteo Guglielmi a écrit :
>>>
>>> I believe so:
>>>
>>> SelectType=select/cons_res
>>> SelectTypeParameters=CR_Core_Memory
>>>
>>> Running the fast loop tests now...
>>>
>>> On 10/14/11 14:38, HAUTREUX Matthieu wrote:
>>>>
>>>> Have you configured task/affinity to do a core binding by default ?
>>>>
>>>> Can you try a modified version of your script like the following give me
>>>> the output for each of your job :
>>>>
>>>> ### jobfile ###
>>>> #SBATCH -n 4
>>>> #SBATCH -N 1
>>>>
>>>> export OMP_NUM_THREADS=4
>>>>
>>>> cat /proc/self/status | grep Cpus_allowed_list
>>>> mpc --L=32 --out=./data --dt=0.05 ...etc
>>>> ###############
>>>>
>>>> You should have only 4 cores associated to each job, and each list of
>>>> cores should be different. If you do not have configured the default
>>>> binding, you will certainly have the same complete list of cores
>>>> available to each job.
>>>>
>>>> Matthieu
>>>>
>>>> Matteo Guglielmi a écrit :
>>>>>
>>>>> Lets say you got a full dollar!
>>>>>
>>>>> Yes, I'm using task/affinity and not task/cgroup....
>>>>>
>>>>> Should I use task/cgroup then?
>>>>>
>>>>> On 10/14/11 13:55, HAUTREUX Matthieu wrote:
>>>>>>
>>>>>> Dear Matteo,
>>>>>>
>>>>>> are you using the task/affinity (or task/cgroup) plugin on your
>>>>>> system ?
>>>>>>
>>>>>> The only way to ensure that your jobs have exclusive access to their
>>>>>> allocated resources is to do that. Indeed, select/cons_res reserve a
>>>>>> part of the cores to each of your job but do not guarantee that
>>>>>> each job
>>>>>> will only be able to use the associated set of cores. This is the role
>>>>>> of the task/affinity or task/cgroup (option ConstrainCores=yes in
>>>>>> cgroup.conf). In your current scenario, if you are not currently using
>>>>>> such a plugin, it could possible that due to memory access
>>>>>> optimization
>>>>>> in the OpenMP library, applications started on a particular socket,
>>>>>> try
>>>>>> to stay on that socket. As a result, if more than 4 applications
>>>>>> primarily start on a same socket, you will have bad performances
>>>>>> due to
>>>>>> threads congestion.
>>>>>>
>>>>>> My 2 cents,
>>>>>> Matthieu
>>>>>>
>>>>>>
>>>>>> Matteo Guglielmi a écrit :
>>>>>>>
>>>>>>> Dar Community,
>>>>>>>
>>>>>>> I'm facing a problem when I submit a series
>>>>>>> of (openmp) jobs using a simple for loop.
>>>>>>>
>>>>>>> Our (fat)nodes have 4 sockets which host 4
>>>>>>> AMD 6176 SE cpus (12-core per cpu).
>>>>>>>
>>>>>>> The relevant part of the jobfile is outlined
>>>>>>> here below:
>>>>>>>
>>>>>>> ### jobfile ###
>>>>>>> #SBATCH -n 4
>>>>>>> #SBATCH -N 1
>>>>>>>
>>>>>>> export OMP_NUM_THREADS=4
>>>>>>>
>>>>>>> mpc --L=32 --out=./data --dt=0.05 ...etc
>>>>>>> ###############
>>>>>>>
>>>>>>> The way I submit a series of 12 jobs is:
>>>>>>>
>>>>>>> for i in {0..11}; do sbatch jobfile; done
>>>>>>>
>>>>>>> Slurm is configured as follow:
>>>>>>>
>>>>>>> SelectType=select/cons_res
>>>>>>>
>>>>>>> As you can see I basically reserve 4 cores
>>>>>>> per job; each mpc job will start 4 threads.
>>>>>>>
>>>>>>> Now, If i submit the 12 jobs "by hand" so
>>>>>>> to speak I get what I expect to have namely
>>>>>>> 12 jobs running at 400%... perfect.
>>>>>>>
>>>>>>> But if I submit the 12 jobs via a for cycle
>>>>>>> as outlined above I always get 2 or 3 jobs
>>>>>>> out of 12 running at 300%.
>>>>>>>
>>>>>>> To me it seems a racing problem which
>>>>>>> ultimately leads to more than one thread
>>>>>>> being "assigned" to the very same core.
>>>>>>>
>>>>>>> Question)
>>>>>>>
>>>>>>> Can this be possible?
>>>>>>>
>>>>>>> How to avoid it?
>>>>>>>
>>>>>>>
>>>>>>> Of course inserting a "sleep 0.5" into the
>>>>>>> for cycle does fix the problem... but I'm
>>>>>>> still worried about what will happen when
>>>>>>> hundreds of users will be submitting jobs
>>>>>>> at the same time.
>>>>>>>
>>>>>>> I'm still testing slurm and I'd like to make
>>>>>>> sure that this problem will not occur when
>>>>>>> I will set it as the default batch system.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> --matt
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> .
>>>>
>>>
>>
>> .
>>
>
>

Re: [slurm-dev] Fast Job Submission Via For Loop

Reply via email to