27 feb 2013 kl. 20:38 skrev Reuti <[email protected]>
:

> Am 27.02.2013 um 16:22 schrieb Mikael Brandström Durling:
> 
>> Ok, seems somewhat hard to patch without deep knowledge of the inner 
>> workings of GE. Interstingly, if I manually start a qrsh -pe openmpi_span N, 
>> and then qrsh -inherit into the slave, that slave has knowledge of the 
>> number of slots allocated to it in the environment ($NSLOTS), but in execd's 
>> do_ck_to_do (execs_ck_to_do.c), the nslots value that the h_vmem limits is 
>> multiplied with must be another value (1). I'll see what workaround to go 
>> for, as we don't trust our users to stay within the limit they ask for. Many 
>> of them has no clue as to what resources might be a reasonable request.
>> Thanks for your rapid reply,
> 
> You're welcome.

Thanks…

> 
> In case you look deeper into the issue, it's also worth to note that there is 
> no option to specify the target queue for `qrsh -inherit` in case you get 
> slots from different queues on the slave system:
> 
> https://arc.liv.ac.uk/trac/SGE/ticket/813
> 

Ok. This could lead to incompatible changes to the -inherit behaviour, if the 
caller to `qrsh -inherit` has to specify the queue requested. On the other 
hand, I have seen cases where an OMPI job has been allotted slots from two 
different queues on an exec host, which has resulted in ompi launching two 
`qrsh -inherit` to the same host.


> Maybe it's related to the $NSLOTS. If you get slots from one and the same 
> queue it seems to be indeed correct for the slave nodes. But for a local 
> `qrsh -inherit` on the master node of the serial job it looks like being set 
> to the overall slot count instead.


I noted that too. I will see if I get some spare time to hunt down this track. 
It seems that an ideal solution could be that $NSLOTS is set to the allotted 
number of slots for the current job (i.e. correct the number in the master 
job), and that `qrsh -inherit` could take an argument of 'queue@host' type.

I'll think of this and add it as a comment to the ticket. Is that trac instance 
at arc.liv.ac.uk the best place, even though we are running OGS? I suppose so?

Mikael

> 
> -- Reuti
> 
> 
>> Mikael
>> 
>> 
>> 26 feb 2013 kl. 21:32 skrev Reuti <[email protected]>
>> :
>> 
>>> Am 26.02.2013 um 19:45 schrieb Mikael Brandström Durling:
>>> 
>>>> I have recently been trying to run OpenMPI jobs spanning several nodes on 
>>>> our small cluster. However, it seems to me as sub-jobs launched with qsub 
>>>> -inherit (by openmpi) gets killed at a memory limit of h_vmem, instead of 
>>>> h_vmem times the number of slots allocated to the sub-node.
>>> 
>>> Unfortunately this is correct:
>>> 
>>> https://arc.liv.ac.uk/trac/SGE/ticket/197
>>> 
>>> Only way around: use virtual_free instead and hope that they users comply 
>>> to this estimated value.
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> Is there any way to get the correct allocation to the sub nodes? I have 
>>>> some vague memory that I have read something about this. As it behaves 
>>>> now, it is impossible to run large memory MPI jobs for us. Would making 
>>>> h_vmem a per job consumable, rather than slot wise, give any other 
>>>> behaviour?
>>>> 
>>>> We are using OGS GE2011.11.
>>>> 
>>>> Thanks for any hints on this issue,
>>>> 
>>>> Mikael
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> https://gridengine.org/mailman/listinfo/users
>>> 
>> 
>> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to