On 14 December 2011 08:53, Christoph Müller <[email protected]> wrote: > Hi William, > >> -----Ursprüngliche Nachricht----- >> Von: [email protected] [mailto:[email protected]] Im Auftrag >> von William Hay >> Gesendet: Mittwoch, 14. Dezember 2011 09:47 >> An: Christoph Müller >> Cc: Reuti; [email protected] >> Betreff: Re: [gridengine users] Access complex resources from prolog script > > >> >> > Yes. What I want to do is compute an environment variable (based on >> >> > the >> >> complex resource requested by the user and the resources of this type >> >> available on each machine) and set this for the job. To do so, I want >> >> to SSH to every node and adjust the environment file. >> >> We recently set up GPUs on our cluster. Rather than change the >> environment we set the permissions on the device files used to access the >> GPUs to be accessible only by the group which SGE executes(ie the one >> allocated from gid_range). The epilog then sets them to be accessible only >> to >> root:root. This has the advantage of being enforced. If we ran multi-node > > I was also thinking about using the device files for enforcing the allocation > as a second step. However, how do you tell the user program which GPU to use? > As I understand the program would have to test all GPUs and the one that is > working is then used.
This doesn't seem like a problem. All our GPU code seems to be fairly greedy and quickly grab any GPUs it can. Testing a couple of /dev files is no great overhead. If it bothers you write a script to output the allocated GPUs and invoke it via starter_method. Export MYGPUS=$(/usr/local/bin/mygpus). Another possibility would be to create appropriate /dev files in a location determined by job/taskid and try to get the code to use those. > > >> jobs on our GPU setup we'd probably change it to be owned by the user in >> question as our(JSV enforced) policy is that multi-node jobs have exclusive >> access to the nodes on which they run. If we didn't have such a policy then > > I wanted to avoid exclusive node access. However, if I fail to implement the > necessary prolog, that remains the only solution, because it avoids any > conflict with the GPUs in the first place. But I am not yet willing to give > up... > > >> this might be a bit tricky as I believe you are not guaranteed to get the >> same >> gid on all nodes. >> >> Fair bit of fiddling with lock files to make sure we don't double assign a >> GPU >> though. > > That stuff is already implemented. My prolog and epilog scripts track which > GPU is assigned to which job using lock files, which seems to work fairly > well. > > Best regards, > Christoph > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
