On 13 December 2011 19:11, Christoph Müller
<[email protected]> wrote:
> Hi Reuti,
>
>> -----Ursprüngliche Nachricht-----
>> Von: Reuti [mailto:[email protected]]
>> Gesendet: Dienstag, 13. Dezember 2011 19:20
>> An: Christoph Müller
>> Cc: [email protected]
>> Betreff: Re: AW: AW: [gridengine users] Access complex resources from
>> prolog script
>
>
>> > Yes. What I want to do is compute an environment variable (based on the
>> complex resource requested by the user and the resources of this type
>> available on each machine) and set this for the job. To do so, I want to SSH 
>> to
>> every node and adjust the environment file.

We recently set up GPUs on our cluster.  Rather than change  the
environment we set the permissions on the device files used to access
the GPUs to be accessible only by the group which SGE executes(ie the
one allocated from gid_range).  The epilog then sets them to be
accessible only to root:root.  This has the advantage of being
enforced.  If we ran multi-node jobs on our GPU setup we'd probably
change it to be owned by the user in question as our(JSV enforced)
policy is that multi-node jobs have exclusive access to the nodes on
which they run.  If we didn't have such a policy then this might be a
bit tricky as I believe you are not guaranteed to get the same gid on
all nodes.

Fair bit of fiddling with lock files to make sure we don't double
assign a GPU though.

>>
>> The problem is, that it's only created when `qrsh -inherit ...` is executed.
>> There is nothing on the slave node beforehand - just an empty directory
>> AFAICS. Also no environment file.
>
> OK. Could I use qrsh to run my stuff or would that create some new kind of 
> job?
>
>
>> Why not putting it in a starter_method. Initialy I wasn't aware that it's a
>> parallel job. What information you want to set up - a local scratch 
>> directory?
>
> I actually did not think of a starter_method, but I will give it a try. My 
> problem probably is that I need to solve a level 10 problem with level 1 SGE 
> knowledge. I want to achieve the following: I have defined a GPU resource as 
> complex and each host provides two of these. SGE perfectly honours the 
> resource requests, but I need to tell the user which of the GPUs has been 
> assigned. This cannot be solved by creating a new complex for each GPU, 
> because I cannot expect the user to choose the right one in the job script. 
> Using your JSV idea and the prolog script, I almost solved the problem, but 
> the last step is setting the environment variable containing the result - and 
> that step is missing.
>
>
>> >> setup SGE's configuration to use ssh (in case you really need it),
>> >> all variables should be inherited from the sge_shepherd.
>> >
>> > You mean that any SSH session I open from the prolog should inherit the
>> environment? That is not the case here. Where can I change the
>> configuration accordingly? What I can confirm is that the MPI jobs correctly
>> inherit the environment.
>>
>> You defined ssh to be used in `qconf -sconf`? For a tight integration the PAM
>> needs also to be adjusted.
>
> Yes, I use ssh as rsh_command. Login via PKI is also working perfectly. 
> Basically, SGE is working, I just have a problem with my prolog.
>
> Best regards,
> Christoph
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to