Am 12.03.2013 um 12:24 schrieb Mikael Brandström Durling:

> 12 mar 2013 kl. 12:12 skrev Reuti <[email protected]>
> :
> 
>> Hi,
>> 
>> Am 12.03.2013 um 10:41 schrieb Mikael Brandström Durling:
>> 
>>> I noticed in the man page for sge_conf that there is an experimental option 
>>> for enabling cpusets in age.  I tried to search the mail list archives and 
>>> google for documentation on how to enable it, which led to the 
>>> util/resources/scripts/setup-cgroups-etc. Calling it from the sgeexecd init 
>>> script and enabling cgroups in qconf -mconf yields that jobs are launched  
>>> into cpusets. However, I have a questions regarding this. I suppose this 
>>> feature can be used to replace the core binding, and ensuring that a task, 
>>> serial or within a PE is only allowed to use the cpus assigned to it?
>> 
>> IMO these are two different things: getting a cgroups allocation will limit 
>> the job to use only the assigned cores. But inside the assigned cores there 
>> is no binding of a process to a particular core of a parallel job. Inside 
>> the assigned cores it's the duty of the kernel to place the processes on the 
>> most suited core for a parallel job in each time slice.
> 
> I see.  For our needs it would then suffice if we can assure that a parallel 
> job is confined to the number of cores that are allotted to the PE (even when 
> given a PE with a range). Is this what's happening now with the current 
> implementation of cgroups?

Maybe Dave can explain in detail about the state of implementation regarding 
the "soft request" behavior of -binding to limit the access to other cores.


> The problem we would like to solve is when users submit a job with software 
> that defaults to start a number of worker threads equal to the number of 
> cores, and thus parasitising on other jobs allocations within the node. 

Often this comes due to the software checking the number of installed cores. 
Even if it's limited to certain cores, it would be good not to oversubscribe 
them.

With a file $SGE_ROOT/default/common/sge_request

you could add (or similar variables):

-v OMP_NUM_THREADS=1,MKL_NUM_THREADS=1

to avoid threads at all. If it's more sophisticated with a varying number, it's 
also possible to use either a JSV or to add them to 
$SGE_JOB_SPOOL_DIR/environment (the script must run as queue prolog under the 
SGE admin account). Sure, this can be overridden by the job.


>>> However, I can't figure how this is done. Has anyone documented it?
>>> 
>>> Otherwise I'll have to resort to the traditional core binding in sge, but 
>>> then I need to set up a jsv to set the binding properly for parallel tasks. 
>>> Is there some jsv snippet available to do this? I frequently se references 
>>> to this approach on the list, but I have not found a working example.
>> 
>> Using complete nodes and then binding in Open MPI or other parallel 
>> libraries is also an option. Especially as all processes are then bound to a 
>> specific dedicated core for each of them. How many cores do you have in the 
>> machines and how many cores are usually used by a parallel job?
> 
> We have 8 or 16 cores per machine. However, as the workloads is a mixture 
> between serial and parallel jobs, it is quite common that parallel jobs get 
> stuck in the queue as there is no complete node free. To get around this 
> problem we are quite often spreading MPI jobs across nodes to pick up free 
> slots. The softwares we are using are not passing a lot of information across 
> MPI, but mostly uses it for synchronisation.

Depending on your workflow options are:

- limit certain nodes to serial resp. parallel jobs
- limit the number of serial and parallel slost per exechost by using two queues

-- Reuti


> Mikael
> 
>> 
>> -- Reuti
>> 
>> 
>>> Thanks in advance,
>>> Mikael
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to