Am 14.04.2014 um 20:57 schrieb Feng Zhang:

> Thanks, Reuti,
> 
> The socket solution looks like only work fine for serial jobs, not PE
> jobs, right?

You mean using more than one GPU at a time, or using parallel processes as 
usual in addition to the GPU?


> Our cluster has different nodes, some nodes each has 2 GPUs, some
> others each has 4 GPUs. Most of the user jobs are PE jobs, some are
> serial.
> 
> The socket solution can event work for PE jobs, but as my
> understanding, it is not efficient? Since each node has, for example,
> 4 queues. If one user submit a PE job to a queue, he/she can not use
> the other GPUs on the other queues?

In SGE you don't submit to queues. You request resources. In case you want a 
GPU job going to a set of queues the best way would be to attach a boolean 
complex to these queues and submit the job with a request for this complex. SGE 
is then free to elect any of the queues with this feature.

Look at the link Gowtham posted for this.

-- Reuti


> On Mon, Apr 14, 2014 at 2:16 PM, Reuti <re...@staff.uni-marburg.de> wrote:
>> Am 14.04.2014 um 20:06 schrieb Feng Zhang:
>> 
>>> Thanks, Ian!
>>> 
>>> I haven't checked the GPU load sensor in detail, either. It sounds to
>>> me it only handles the number of GPU allocated to a job, but the job
>>> doesn't know which GPUs it actually get and set the
>>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can
>>> be done by writing some scripts/programs, but to me, it is not an
>>> accurate solution, since some jobs may still happen to collide to each
>>> other on the same GPU on a multiple GPU node. If GE can have the
>>> memory to record the GPUs allocated to a job, then this can be
>>> perfect.
>> 
>> Like the option to request sockets instead of cores which I posted in the 
>> last couple of days, you can use a similar approach to get the number of the 
>> granted GPU out of the queue name.
>> 
>> -- Reuti
>> 
>> 
>>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote:
>>>> I believe there already is support for GPUs - there is a GPU Load
>>>> Sensor in Open Grid Engine. You may have to build it yourself, I
>>>> haven't checked to see if it comes pre-packaged.
>>>> 
>>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at
>>>> least has been working on it.
>>>> 
>>>> Ian
>>>> 
>>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> Is there's any plan to implement the GPU resource management in SGE in
>>>>> the near future? Like Slurm or Torque? There are some ways to do this
>>>>> using scripts/programs, but I wonder that if the SGE itself can
>>>>> recognize and manage GPU(and Phi). Not need to be complicated and
>>>>> powerful, just do basic work.
>>>>> 
>>>>> Thanks,
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@gridengine.org
>>>>> https://gridengine.org/mailman/listinfo/users
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Ian Kaufman
>>>> Research Systems Administrator
>>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>>> _______________________________________________
>>> users mailing list
>>> users@gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to