Re: [gridengine users] MPI jobs spanning several nodes and h_vmem limits

Reuti Wed, 27 Feb 2013 11:41:45 -0800

Am 27.02.2013 um 16:22 schrieb Mikael Brandström Durling:

> Ok, seems somewhat hard to patch without deep knowledge of the inner workings 
> of GE. Interstingly, if I manually start a qrsh -pe openmpi_span N, and then 
> qrsh -inherit into the slave, that slave has knowledge of the number of slots 
> allocated to it in the environment ($NSLOTS), but in execd's do_ck_to_do 
> (execs_ck_to_do.c), the nslots value that the h_vmem limits is multiplied 
> with must be another value (1). I'll see what workaround to go for, as we 
> don't trust our users to stay within the limit they ask for. Many of them has 
> no clue as to what resources might be a reasonable request.
> Thanks for your rapid reply,


You're welcome.

In case you look deeper into the issue, it's also worth to note that there is 
no option to specify the target queue for `qrsh -inherit` in case you get slots 
from different queues on the slave system:

https://arc.liv.ac.uk/trac/SGE/ticket/813

Maybe it's related to the $NSLOTS. If you get slots from one and the same queue 
it seems to be indeed correct for the slave nodes. But for a local `qrsh 
-inherit` on the master node of the serial job it looks like being set to the 
overall slot count instead.

-- Reuti


> Mikael
> 
> 
> 26 feb 2013 kl. 21:32 skrev Reuti <[email protected]>
> :
> 
>> Am 26.02.2013 um 19:45 schrieb Mikael Brandström Durling:
>> 
>>> I have recently been trying to run OpenMPI jobs spanning several nodes on 
>>> our small cluster. However, it seems to me as sub-jobs launched with qsub 
>>> -inherit (by openmpi) gets killed at a memory limit of h_vmem, instead of 
>>> h_vmem times the number of slots allocated to the sub-node.
>> 
>> Unfortunately this is correct:
>> 
>> https://arc.liv.ac.uk/trac/SGE/ticket/197
>> 
>> Only way around: use virtual_free instead and hope that they users comply to 
>> this estimated value.
>> 
>> -- Reuti
>> 
>> 
>>> Is there any way to get the correct allocation to the sub nodes? I have 
>>> some vague memory that I have read something about this. As it behaves now, 
>>> it is impossible to run large memory MPI jobs for us. Would making h_vmem a 
>>> per job consumable, rather than slot wise, give any other behaviour?
>>> 
>>> We are using OGS GE2011.11.
>>> 
>>> Thanks for any hints on this issue,
>>> 
>>> Mikael
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] MPI jobs spanning several nodes and h_vmem limits

Reply via email to