Re: [gridengine users] MPI jobs spanning several nodes and h_vmem limits

Mikael Brandström Durling Wed, 27 Feb 2013 07:24:05 -0800

Ok, seems somewhat hard to patch without deep knowledge of the inner workings 
of GE. Interstingly, if I manually start a qrsh -pe openmpi_span N, and then 
qrsh -inherit into the slave, that slave has knowledge of the number of slots 
allocated to it in the environment ($NSLOTS), but in execd's do_ck_to_do 
(execs_ck_to_do.c), the nslots value that the h_vmem limits is multiplied with 
must be another value (1). I'll see what workaround to go for, as we don't 
trust our users to stay within the limit they ask for. Many of them has no clue 
as to what resources might be a reasonable request.
Thanks for your rapid reply,
Mikael



26 feb 2013 kl. 21:32 skrev Reuti <[email protected]>
:

> Am 26.02.2013 um 19:45 schrieb Mikael Brandström Durling:
> 
>> I have recently been trying to run OpenMPI jobs spanning several nodes on 
>> our small cluster. However, it seems to me as sub-jobs launched with qsub 
>> -inherit (by openmpi) gets killed at a memory limit of h_vmem, instead of 
>> h_vmem times the number of slots allocated to the sub-node.
> 
> Unfortunately this is correct:
> 
> https://arc.liv.ac.uk/trac/SGE/ticket/197
> 
> Only way around: use virtual_free instead and hope that they users comply to 
> this estimated value.
> 
> -- Reuti
> 
> 
>> Is there any way to get the correct allocation to the sub nodes? I have some 
>> vague memory that I have read something about this. As it behaves now, it is 
>> impossible to run large memory MPI jobs for us. Would making h_vmem a per 
>> job consumable, rather than slot wise, give any other behaviour?
>> 
>> We are using OGS GE2011.11.
>> 
>> Thanks for any hints on this issue,
>> 
>> Mikael
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] MPI jobs spanning several nodes and h_vmem limits

Reply via email to