Hi,
Am 10.09.2012 um 09:36 schrieb Ray Spence:
> We are running SGE 6.2-4u5. We are now trying to understand how to
> configure PEs the way we want in order to restrict users from using more
> than one slot unless they submit their job to a PE. I have read through
> some documentation and previous posts here but see no clear answer.
> Specifically - our less knowledgeable users might submit C, R and OpenBLAS
> code that may fork child processes that SGE will not know about. We
> want to enforce queue-wide restrictions that reign in any job to one of
> two types:
>
> 1) if no PE is called in qsub then restrict that job to one and only one
> slot. If this is possible - what are our options for jobs that attempt to
> fork?
> 2) if PE is identified in qsub then queue job accordingly and still restrict
> job to number of slots requested. Similarly, what are our options for
> PE jobs that attempt to fork over their requested number of slots?
>
> Is there any way to do this? The sge_pe man page section on control_slaves
> states " However, to gain control over the slave tasks of a parallel
> application,
> a sophisticated PE interface is required,.." Where do I find these?
> Also, might startup_method and/or JSV be used to accomplish this control?
as long as you don't oversubscribe machines, i.e. not assigning more jobs (slot
wise) to a node than cores are available, you can lock in the `qsub -binding
...` feature. As long as users don't reset the setting, their (forked)
processes should all be bound to the granted assignment of cores and can't
escape. You can enforce it with a JSV even if the users don't request it.
Inside the JSV you can use something like this:
CMDNAME=$(jsv_get_param CMDNAME)
if [ "$CMDNAME" != "NONE" ]; then
pe_name=$(jsv_get_param pe_name)
if [ "$pe_name" ]; then
pe_min=$(jsv_get_param pe_min)
pe_max=$(jsv_get_param pe_max)
let cores=pe_max
jsv_set_param pe_min $cores
else
cores=1
fi
jsv_set_param binding_strategy linear_automatic
jsv_set_param binding_type set
jsv_set_param binding_amount $cores
fi
Rayson is working on cgroups integration, but I don't know about the current
state:
http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html
-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users