Re: [gridengine users] SGE and GPU

Feng Zhang Mon, 14 Apr 2014 11:59:37 -0700

Thanks, Reuti,

The socket solution looks like only work fine for serial jobs, not PE
jobs, right?


Our cluster has different nodes, some nodes each has 2 GPUs, some
others each has 4 GPUs. Most of the user jobs are PE jobs, some are
serial.

The socket solution can event work for PE jobs, but as my
understanding, it is not efficient? Since each node has, for example,
4 queues. If one user submit a PE job to a queue, he/she can not use
the other GPUs on the other queues?

On Mon, Apr 14, 2014 at 2:16 PM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 14.04.2014 um 20:06 schrieb Feng Zhang:
>
>> Thanks, Ian!
>>
>> I haven't checked the GPU load sensor in detail, either. It sounds to
>> me it only handles the number of GPU allocated to a job, but the job
>> doesn't know which GPUs it actually get and set the
>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can
>> be done by writing some scripts/programs, but to me, it is not an
>> accurate solution, since some jobs may still happen to collide to each
>> other on the same GPU on a multiple GPU node. If GE can have the
>> memory to record the GPUs allocated to a job, then this can be
>> perfect.
>
> Like the option to request sockets instead of cores which I posted in the 
> last couple of days, you can use a similar approach to get the number of the 
> granted GPU out of the queue name.
>
> -- Reuti
>
>
>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote:
>>> I believe there already is support for GPUs - there is a GPU Load
>>> Sensor in Open Grid Engine. You may have to build it yourself, I
>>> haven't checked to see if it comes pre-packaged.
>>>
>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at
>>> least has been working on it.
>>>
>>> Ian
>>>
>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Is there's any plan to implement the GPU resource management in SGE in
>>>> the near future? Like Slurm or Torque? There are some ways to do this
>>>> using scripts/programs, but I wonder that if the SGE itself can
>>>> recognize and manage GPU(and Phi). Not need to be complicated and
>>>> powerful, just do basic work.
>>>>
>>>> Thanks,
>>>> _______________________________________________
>>>> users mailing list
>>>> users@gridengine.org
>>>> https://gridengine.org/mailman/listinfo/users
>>>
>>>
>>>
>>> --
>>> Ian Kaufman
>>> Research Systems Administrator
>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SGE and GPU

Reply via email to