Look at the info presented here: http://stackoverflow.com/questions/10557816/scheduling-gpu-resources-using-the-sun-grid-engine-sge
Ian On Mon, Apr 14, 2014 at 1:29 PM, Feng Zhang <[email protected]> wrote: > Thanks, Ian and Gowtham! > > > This is a very nice instruction. One of my problem is, for example: > > node1, number of gpu=4 > node2, number of gpu=4 > node3, number of gpu=2 > > So in total I have 10 GPUs. > > Right now, user A has a serial GPU job, which takes one GPU on > node1(Don't know which GPU though). So node1:3, node2:4 and node3:2 > GPUs are still free for jobs. > > I submit one job with PE=8. SGE allocate all the 3 nodes to me with 8 > GPU slots. The problem is now: how my job knows what GPUs it can get > on node1? > > Best > > > > > On Mon, Apr 14, 2014 at 4:13 PM, Ian Kaufman <[email protected]> wrote: >> Again, look into using it as a consumable resource as Gowtham posted above. >> >> Ian >> >> On Mon, Apr 14, 2014 at 11:57 AM, Feng Zhang <[email protected]> wrote: >>> Thanks, Reuti, >>> >>> The socket solution looks like only work fine for serial jobs, not PE >>> jobs, right? >>> >>> Our cluster has different nodes, some nodes each has 2 GPUs, some >>> others each has 4 GPUs. Most of the user jobs are PE jobs, some are >>> serial. >>> >>> The socket solution can event work for PE jobs, but as my >>> understanding, it is not efficient? Since each node has, for example, >>> 4 queues. If one user submit a PE job to a queue, he/she can not use >>> the other GPUs on the other queues? >>> >>> On Mon, Apr 14, 2014 at 2:16 PM, Reuti <[email protected]> wrote: >>>> Am 14.04.2014 um 20:06 schrieb Feng Zhang: >>>> >>>>> Thanks, Ian! >>>>> >>>>> I haven't checked the GPU load sensor in detail, either. It sounds to >>>>> me it only handles the number of GPU allocated to a job, but the job >>>>> doesn't know which GPUs it actually get and set the >>>>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can >>>>> be done by writing some scripts/programs, but to me, it is not an >>>>> accurate solution, since some jobs may still happen to collide to each >>>>> other on the same GPU on a multiple GPU node. If GE can have the >>>>> memory to record the GPUs allocated to a job, then this can be >>>>> perfect. >>>> >>>> Like the option to request sockets instead of cores which I posted in the >>>> last couple of days, you can use a similar approach to get the number of >>>> the granted GPU out of the queue name. >>>> >>>> -- Reuti >>>> >>>> >>>>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <[email protected]> >>>>> wrote: >>>>>> I believe there already is support for GPUs - there is a GPU Load >>>>>> Sensor in Open Grid Engine. You may have to build it yourself, I >>>>>> haven't checked to see if it comes pre-packaged. >>>>>> >>>>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at >>>>>> least has been working on it. >>>>>> >>>>>> Ian >>>>>> >>>>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <[email protected]> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Is there's any plan to implement the GPU resource management in SGE in >>>>>>> the near future? Like Slurm or Torque? There are some ways to do this >>>>>>> using scripts/programs, but I wonder that if the SGE itself can >>>>>>> recognize and manage GPU(and Phi). Not need to be complicated and >>>>>>> powerful, just do basic work. >>>>>>> >>>>>>> Thanks, >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> [email protected] >>>>>>> https://gridengine.org/mailman/listinfo/users >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ian Kaufman >>>>>> Research Systems Administrator >>>>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>>> >> >> >> >> -- >> Ian Kaufman >> Research Systems Administrator >> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
