On Mon, Apr 14, 2014 at 10:42 PM, Feng Zhang <[email protected]> wrote: > On Mon, Apr 14, 2014 at 5:36 PM, Reuti <[email protected]> wrote: >> Am 14.04.2014 um 20:57 schrieb Feng Zhang: >> >>> Thanks, Reuti, >>> >>> The socket solution looks like only work fine for serial jobs, not PE >>> jobs, right? >> >> You mean using more than one GPU at a time, or using parallel processes as >> usual in addition to the GPU? >> > > Thanks, Reuti! > Right, using parallel processes running on GPUs(on the same node, and > also cross multiple nodes).
For example, GE+OPENMPI > > Best > >> >>> Our cluster has different nodes, some nodes each has 2 GPUs, some >>> others each has 4 GPUs. Most of the user jobs are PE jobs, some are >>> serial. >>> >>> The socket solution can event work for PE jobs, but as my >>> understanding, it is not efficient? Since each node has, for example, >>> 4 queues. If one user submit a PE job to a queue, he/she can not use >>> the other GPUs on the other queues? >> >> In SGE you don't submit to queues. You request resources. In case you want a >> GPU job going to a set of queues the best way would be to attach a boolean >> complex to these queues and submit the job with a request for this complex. >> SGE is then free to elect any of the queues with this feature. >> >> Look at the link Gowtham posted for this. >> >> -- Reuti >> >> >>> On Mon, Apr 14, 2014 at 2:16 PM, Reuti <[email protected]> wrote: >>>> Am 14.04.2014 um 20:06 schrieb Feng Zhang: >>>> >>>>> Thanks, Ian! >>>>> >>>>> I haven't checked the GPU load sensor in detail, either. It sounds to >>>>> me it only handles the number of GPU allocated to a job, but the job >>>>> doesn't know which GPUs it actually get and set the >>>>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can >>>>> be done by writing some scripts/programs, but to me, it is not an >>>>> accurate solution, since some jobs may still happen to collide to each >>>>> other on the same GPU on a multiple GPU node. If GE can have the >>>>> memory to record the GPUs allocated to a job, then this can be >>>>> perfect. >>>> >>>> Like the option to request sockets instead of cores which I posted in the >>>> last couple of days, you can use a similar approach to get the number of >>>> the granted GPU out of the queue name. >>>> >>>> -- Reuti >>>> >>>> >>>>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <[email protected]> >>>>> wrote: >>>>>> I believe there already is support for GPUs - there is a GPU Load >>>>>> Sensor in Open Grid Engine. You may have to build it yourself, I >>>>>> haven't checked to see if it comes pre-packaged. >>>>>> >>>>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at >>>>>> least has been working on it. >>>>>> >>>>>> Ian >>>>>> >>>>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <[email protected]> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Is there's any plan to implement the GPU resource management in SGE in >>>>>>> the near future? Like Slurm or Torque? There are some ways to do this >>>>>>> using scripts/programs, but I wonder that if the SGE itself can >>>>>>> recognize and manage GPU(and Phi). Not need to be complicated and >>>>>>> powerful, just do basic work. >>>>>>> >>>>>>> Thanks, >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> [email protected] >>>>>>> https://gridengine.org/mailman/listinfo/users >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ian Kaufman >>>>>> Research Systems Administrator >>>>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>>> >> _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
