Again, look into using it as a consumable resource as Gowtham posted above.

Ian

On Mon, Apr 14, 2014 at 11:57 AM, Feng Zhang <[email protected]> wrote:
> Thanks, Reuti,
>
> The socket solution looks like only work fine for serial jobs, not PE
> jobs, right?
>
> Our cluster has different nodes, some nodes each has 2 GPUs, some
> others each has 4 GPUs. Most of the user jobs are PE jobs, some are
> serial.
>
> The socket solution can event work for PE jobs, but as my
> understanding, it is not efficient? Since each node has, for example,
> 4 queues. If one user submit a PE job to a queue, he/she can not use
> the other GPUs on the other queues?
>
> On Mon, Apr 14, 2014 at 2:16 PM, Reuti <[email protected]> wrote:
>> Am 14.04.2014 um 20:06 schrieb Feng Zhang:
>>
>>> Thanks, Ian!
>>>
>>> I haven't checked the GPU load sensor in detail, either. It sounds to
>>> me it only handles the number of GPU allocated to a job, but the job
>>> doesn't know which GPUs it actually get and set the
>>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can
>>> be done by writing some scripts/programs, but to me, it is not an
>>> accurate solution, since some jobs may still happen to collide to each
>>> other on the same GPU on a multiple GPU node. If GE can have the
>>> memory to record the GPUs allocated to a job, then this can be
>>> perfect.
>>
>> Like the option to request sockets instead of cores which I posted in the 
>> last couple of days, you can use a similar approach to get the number of the 
>> granted GPU out of the queue name.
>>
>> -- Reuti
>>
>>
>>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <[email protected]> wrote:
>>>> I believe there already is support for GPUs - there is a GPU Load
>>>> Sensor in Open Grid Engine. You may have to build it yourself, I
>>>> haven't checked to see if it comes pre-packaged.
>>>>
>>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at
>>>> least has been working on it.
>>>>
>>>> Ian
>>>>
>>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> Is there's any plan to implement the GPU resource management in SGE in
>>>>> the near future? Like Slurm or Torque? There are some ways to do this
>>>>> using scripts/programs, but I wonder that if the SGE itself can
>>>>> recognize and manage GPU(and Phi). Not need to be complicated and
>>>>> powerful, just do basic work.
>>>>>
>>>>> Thanks,
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>>>
>>>>
>>>>
>>>> --
>>>> Ian Kaufman
>>>> Research Systems Administrator
>>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>



-- 
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to