Re: [gridengine users] SGE 8.1.3 and cpusets/cgroups

Mikael Brandström Durling Tue, 12 Mar 2013 15:16:30 -0700

12 mar 2013 kl. 15:33 skrev Reuti <[email protected]>
:

> 
>> The problem we would like to solve is when users submit a job with software 
>> that defaults to start a number of worker threads equal to the number of 
>> cores, and thus parasitising on other jobs allocations within the node. 
> 
> Often this comes due to the software checking the number of installed cores. 
> Even if it's limited to certain cores, it would be good not to oversubscribe 
> them.


Sure, that is the case. The problem, though, is that we have users who are not 
aware of this. The goal is, sort of, to protect the well-behaving users from 
the ones who are not. The latter are not behaving well due to lack of 
knowledge. Ie, they might download a program and submit it to as single slot 
program while not being aware that the program will actually check how many 
cores it can find and launch that number of threads. In this case it is 
perfectly fine if the confinement is a "soft limit", as these users don't have 
the competence to work around the soft limit.

> With a file $SGE_ROOT/default/common/sge_request
> 
> you could add (or similar variables):
> 
> -v OMP_NUM_THREADS=1,MKL_NUM_THREADS=1
> 
> to avoid threads at all. If it's more sophisticated with a varying number, 
> it's also possible to use either a JSV or to add them to 
> $SGE_JOB_SPOOL_DIR/environment (the script must run as queue prolog under the 
> SGE admin account). Sure, this can be overridden by the job.

Ok, so one could basically use the prolog to set OMP_NUM_THREADS to $NSLOTS for 
smp jobs. I had not realised that it was possible to edit 
$SGE_JOB_SPOOL_DIR/environment in order to modify the job environment. Thanks 
for the hint.

>> 
>> We have 8 or 16 cores per machine. However, as the workloads is a mixture 
>> between serial and parallel jobs, it is quite common that parallel jobs get 
>> stuck in the queue as there is no complete node free. To get around this 
>> problem we are quite often spreading MPI jobs across nodes to pick up free 
>> slots. The softwares we are using are not passing a lot of information 
>> across MPI, but mostly uses it for synchronisation.
> 
> Depending on your workflow options are:
> 
> - limit certain nodes to serial resp. parallel jobs
> - limit the number of serial and parallel slost per exechost by using two 
> queues

Yes, but that would still end up in unused slots in cases where short serial 
jobs could be scheduled. In that case we prefer some jobs having to wait 
slightly longer while keeping the hardware busy. It's a small departmental 
cluster used for diverse bioinformatic workloads.

Mikael


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SGE 8.1.3 and cpusets/cgroups

Reply via email to