Reuti <[email protected]> writes:

> Hi,
>
> Am 10.09.2012 um 09:36 schrieb Ray Spence:
>
>> We are running SGE 6.2-4u5.

Does that mean 6.2u5 (which introduced core binding on Linux)?

> as long as you don't oversubscribe machines, i.e. not assigning more
> jobs (slot wise) to a node than cores are available, you can lock in
> the `qsub -binding ...` feature. As long as users don't reset the
> setting, their (forked) processes should all be bound to the granted
> assignment of cores and can't escape.

The main trouble is that some runtimes (e.g. GNU GOMP) ignore the
binding they're handed and do their own, so you need to tell them the
binding explicitly.

It makes a difference what OS and hardware is involved, but for
GNU/Linux -- short of tricks with setuid starter methods -- as far as I
know, the only currently-available way of confining jobs is with the
cpuset support in sge-8.1.2, described in
<http://arc.liv.ac.uk/SGE/howto/remove_orphaned_processes.html>.

> You can enforce it with a JSV
> even if the users don't request it. Inside the JSV you can use
> something like this:
>
>    CMDNAME=$(jsv_get_param CMDNAME)
>    if [ "$CMDNAME" != "NONE" ]; then
>        pe_name=$(jsv_get_param pe_name)
>        if [ "$pe_name" ]; then
>            pe_min=$(jsv_get_param pe_min)
>            pe_max=$(jsv_get_param pe_max)
>            let cores=pe_max
>            jsv_set_param pe_min $cores
>        else
>            cores=1
>        fi
>
>        jsv_set_param binding_strategy linear_automatic

[As far as I know, "linear_automatic" is just meant to be used
internally.]

Beware that the core binding in 6.2u5 will only work on Linux and the
SunOS kernel, and won't work properly for recent AMD hardware (and
possibly not other hardware with an extra NUMA level).

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to