On Mon, Apr 14, 2014 at 5:36 PM, Reuti <re...@staff.uni-marburg.de> wrote: > Am 14.04.2014 um 20:57 schrieb Feng Zhang: > >> Thanks, Reuti, >> >> The socket solution looks like only work fine for serial jobs, not PE >> jobs, right? > > You mean using more than one GPU at a time, or using parallel processes as > usual in addition to the GPU? >
Thanks, Reuti! Right, using parallel processes running on GPUs(on the same node, and also cross multiple nodes). Best > >> Our cluster has different nodes, some nodes each has 2 GPUs, some >> others each has 4 GPUs. Most of the user jobs are PE jobs, some are >> serial. >> >> The socket solution can event work for PE jobs, but as my >> understanding, it is not efficient? Since each node has, for example, >> 4 queues. If one user submit a PE job to a queue, he/she can not use >> the other GPUs on the other queues? > > In SGE you don't submit to queues. You request resources. In case you want a > GPU job going to a set of queues the best way would be to attach a boolean > complex to these queues and submit the job with a request for this complex. > SGE is then free to elect any of the queues with this feature. > > Look at the link Gowtham posted for this. > > -- Reuti > > >> On Mon, Apr 14, 2014 at 2:16 PM, Reuti <re...@staff.uni-marburg.de> wrote: >>> Am 14.04.2014 um 20:06 schrieb Feng Zhang: >>> >>>> Thanks, Ian! >>>> >>>> I haven't checked the GPU load sensor in detail, either. It sounds to >>>> me it only handles the number of GPU allocated to a job, but the job >>>> doesn't know which GPUs it actually get and set the >>>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can >>>> be done by writing some scripts/programs, but to me, it is not an >>>> accurate solution, since some jobs may still happen to collide to each >>>> other on the same GPU on a multiple GPU node. If GE can have the >>>> memory to record the GPUs allocated to a job, then this can be >>>> perfect. >>> >>> Like the option to request sockets instead of cores which I posted in the >>> last couple of days, you can use a similar approach to get the number of >>> the granted GPU out of the queue name. >>> >>> -- Reuti >>> >>> >>>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: >>>>> I believe there already is support for GPUs - there is a GPU Load >>>>> Sensor in Open Grid Engine. You may have to build it yourself, I >>>>> haven't checked to see if it comes pre-packaged. >>>>> >>>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at >>>>> least has been working on it. >>>>> >>>>> Ian >>>>> >>>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> wrote: >>>>>> Hi, >>>>>> >>>>>> Is there's any plan to implement the GPU resource management in SGE in >>>>>> the near future? Like Slurm or Torque? There are some ways to do this >>>>>> using scripts/programs, but I wonder that if the SGE itself can >>>>>> recognize and manage GPU(and Phi). Not need to be complicated and >>>>>> powerful, just do basic work. >>>>>> >>>>>> Thanks, >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@gridengine.org >>>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>>>> >>>>> >>>>> -- >>>>> Ian Kaufman >>>>> Research Systems Administrator >>>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >>>> _______________________________________________ >>>> users mailing list >>>> users@gridengine.org >>>> https://gridengine.org/mailman/listinfo/users >>> > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users