Am 14.04.2014 um 20:57 schrieb Feng Zhang: > Thanks, Reuti, > > The socket solution looks like only work fine for serial jobs, not PE > jobs, right?
You mean using more than one GPU at a time, or using parallel processes as usual in addition to the GPU? > Our cluster has different nodes, some nodes each has 2 GPUs, some > others each has 4 GPUs. Most of the user jobs are PE jobs, some are > serial. > > The socket solution can event work for PE jobs, but as my > understanding, it is not efficient? Since each node has, for example, > 4 queues. If one user submit a PE job to a queue, he/she can not use > the other GPUs on the other queues? In SGE you don't submit to queues. You request resources. In case you want a GPU job going to a set of queues the best way would be to attach a boolean complex to these queues and submit the job with a request for this complex. SGE is then free to elect any of the queues with this feature. Look at the link Gowtham posted for this. -- Reuti > On Mon, Apr 14, 2014 at 2:16 PM, Reuti <re...@staff.uni-marburg.de> wrote: >> Am 14.04.2014 um 20:06 schrieb Feng Zhang: >> >>> Thanks, Ian! >>> >>> I haven't checked the GPU load sensor in detail, either. It sounds to >>> me it only handles the number of GPU allocated to a job, but the job >>> doesn't know which GPUs it actually get and set the >>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can >>> be done by writing some scripts/programs, but to me, it is not an >>> accurate solution, since some jobs may still happen to collide to each >>> other on the same GPU on a multiple GPU node. If GE can have the >>> memory to record the GPUs allocated to a job, then this can be >>> perfect. >> >> Like the option to request sockets instead of cores which I posted in the >> last couple of days, you can use a similar approach to get the number of the >> granted GPU out of the queue name. >> >> -- Reuti >> >> >>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: >>>> I believe there already is support for GPUs - there is a GPU Load >>>> Sensor in Open Grid Engine. You may have to build it yourself, I >>>> haven't checked to see if it comes pre-packaged. >>>> >>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at >>>> least has been working on it. >>>> >>>> Ian >>>> >>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> wrote: >>>>> Hi, >>>>> >>>>> Is there's any plan to implement the GPU resource management in SGE in >>>>> the near future? Like Slurm or Torque? There are some ways to do this >>>>> using scripts/programs, but I wonder that if the SGE itself can >>>>> recognize and manage GPU(and Phi). Not need to be complicated and >>>>> powerful, just do basic work. >>>>> >>>>> Thanks, >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@gridengine.org >>>>> https://gridengine.org/mailman/listinfo/users >>>> >>>> >>>> >>>> -- >>>> Ian Kaufman >>>> Research Systems Administrator >>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >>> _______________________________________________ >>> users mailing list >>> users@gridengine.org >>> https://gridengine.org/mailman/listinfo/users >> _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users