On 14 December 2011 08:53, Christoph Müller
<[email protected]> wrote:
> Hi William,
>
>> -----Ursprüngliche Nachricht-----
>> Von: [email protected] [mailto:[email protected]] Im Auftrag
>> von William Hay
>> Gesendet: Mittwoch, 14. Dezember 2011 09:47
>> An: Christoph Müller
>> Cc: Reuti; [email protected]
>> Betreff: Re: [gridengine users] Access complex resources from prolog script
>
>
>> >> > Yes. What I want to do is compute an environment variable (based on
>> >> > the
>> >> complex resource requested by the user and the resources of this type
>> >> available on each machine) and set this for the job. To do so, I want
>> >> to SSH to every node and adjust the environment file.
>>
>> We recently set up GPUs on our cluster.  Rather than change  the
>> environment we set the permissions on the device files used to access the
>> GPUs to be accessible only by the group which SGE executes(ie the one
>> allocated from gid_range).  The epilog then sets them to be accessible only 
>> to
>> root:root.  This has the advantage of being enforced.  If we ran multi-node
>
> I was also thinking about using the device files for enforcing the allocation 
> as a second step. However, how do you tell the user program which GPU to use? 
> As I understand the program would have to test all GPUs and the one that is 
> working is then used.

This doesn't seem like a problem.  All our GPU code seems to be fairly
greedy and quickly grab any GPUs it can.  Testing a couple of /dev
files is no great overhead.  If it bothers you write a script
to output the allocated GPUs and invoke it via starter_method. Export
MYGPUS=$(/usr/local/bin/mygpus).
Another possibility would be to create appropriate /dev files in a
location determined by job/taskid and try to get the code to use
those.



>
>
>> jobs on our GPU setup we'd probably change it to be owned by the user in
>> question as our(JSV enforced) policy is that multi-node jobs have exclusive
>> access to the nodes on which they run.  If we didn't have such a policy then
>
> I wanted to avoid exclusive node access. However, if I fail to implement the 
> necessary prolog, that remains the only solution, because it avoids any 
> conflict with the GPUs in the first place. But I am not yet willing to give 
> up...
>
>
>> this might be a bit tricky as I believe you are not guaranteed to get the 
>> same
>> gid on all nodes.
>>
>> Fair bit of fiddling with lock files to make sure we don't double assign a 
>> GPU
>> though.
>
> That stuff is already implemented. My prolog and epilog scripts track which 
> GPU is assigned to which job using lock files, which seems to work fairly 
> well.
>
> Best regards,
> Christoph
>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to