And here is some more info: http://serverfault.com/questions/322073/howto-set-up-sge-for-cuda-devices
On Mon, Apr 14, 2014 at 1:39 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: > If everything is configured correctly, GridEngine will be aware that > the GPU in node1 is in use, and schedule around it, ensuring that the > 8 GPU job will get unused GPUs. > > Ian > > On Mon, Apr 14, 2014 at 1:38 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: >> Look at the info presented here: >> >> http://stackoverflow.com/questions/10557816/scheduling-gpu-resources-using-the-sun-grid-engine-sge >> >> Ian >> >> On Mon, Apr 14, 2014 at 1:29 PM, Feng Zhang <prod.f...@gmail.com> wrote: >>> Thanks, Ian and Gowtham! >>> >>> >>> This is a very nice instruction. One of my problem is, for example: >>> >>> node1, number of gpu=4 >>> node2, number of gpu=4 >>> node3, number of gpu=2 >>> >>> So in total I have 10 GPUs. >>> >>> Right now, user A has a serial GPU job, which takes one GPU on >>> node1(Don't know which GPU though). So node1:3, node2:4 and node3:2 >>> GPUs are still free for jobs. >>> >>> I submit one job with PE=8. SGE allocate all the 3 nodes to me with 8 >>> GPU slots. The problem is now: how my job knows what GPUs it can get >>> on node1? >>> >>> Best >>> >>> >>> >>> >>> On Mon, Apr 14, 2014 at 4:13 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: >>>> Again, look into using it as a consumable resource as Gowtham posted above. >>>> >>>> Ian >>>> >>>> On Mon, Apr 14, 2014 at 11:57 AM, Feng Zhang <prod.f...@gmail.com> wrote: >>>>> Thanks, Reuti, >>>>> >>>>> The socket solution looks like only work fine for serial jobs, not PE >>>>> jobs, right? >>>>> >>>>> Our cluster has different nodes, some nodes each has 2 GPUs, some >>>>> others each has 4 GPUs. Most of the user jobs are PE jobs, some are >>>>> serial. >>>>> >>>>> The socket solution can event work for PE jobs, but as my >>>>> understanding, it is not efficient? Since each node has, for example, >>>>> 4 queues. If one user submit a PE job to a queue, he/she can not use >>>>> the other GPUs on the other queues? >>>>> >>>>> On Mon, Apr 14, 2014 at 2:16 PM, Reuti <re...@staff.uni-marburg.de> wrote: >>>>>> Am 14.04.2014 um 20:06 schrieb Feng Zhang: >>>>>> >>>>>>> Thanks, Ian! >>>>>>> >>>>>>> I haven't checked the GPU load sensor in detail, either. It sounds to >>>>>>> me it only handles the number of GPU allocated to a job, but the job >>>>>>> doesn't know which GPUs it actually get and set the >>>>>>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can >>>>>>> be done by writing some scripts/programs, but to me, it is not an >>>>>>> accurate solution, since some jobs may still happen to collide to each >>>>>>> other on the same GPU on a multiple GPU node. If GE can have the >>>>>>> memory to record the GPUs allocated to a job, then this can be >>>>>>> perfect. >>>>>> >>>>>> Like the option to request sockets instead of cores which I posted in >>>>>> the last couple of days, you can use a similar approach to get the >>>>>> number of the granted GPU out of the queue name. >>>>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> >>>>>>> wrote: >>>>>>>> I believe there already is support for GPUs - there is a GPU Load >>>>>>>> Sensor in Open Grid Engine. You may have to build it yourself, I >>>>>>>> haven't checked to see if it comes pre-packaged. >>>>>>>> >>>>>>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at >>>>>>>> least has been working on it. >>>>>>>> >>>>>>>> Ian >>>>>>>> >>>>>>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Is there's any plan to implement the GPU resource management in SGE in >>>>>>>>> the near future? Like Slurm or Torque? There are some ways to do this >>>>>>>>> using scripts/programs, but I wonder that if the SGE itself can >>>>>>>>> recognize and manage GPU(and Phi). Not need to be complicated and >>>>>>>>> powerful, just do basic work. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> users@gridengine.org >>>>>>>>> https://gridengine.org/mailman/listinfo/users >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ian Kaufman >>>>>>>> Research Systems Administrator >>>>>>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> users@gridengine.org >>>>>>> https://gridengine.org/mailman/listinfo/users >>>>>> >>>> >>>> >>>> >>>> -- >>>> Ian Kaufman >>>> Research Systems Administrator >>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >> >> >> >> -- >> Ian Kaufman >> Research Systems Administrator >> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu > > > > -- > Ian Kaufman > Research Systems Administrator > UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users