Re: [gridengine users] SGE and GPU

Feng Zhang Mon, 14 Apr 2014 19:44:20 -0700

On Mon, Apr 14, 2014 at 5:36 PM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 14.04.2014 um 20:57 schrieb Feng Zhang:
>
>> Thanks, Reuti,
>>
>> The socket solution looks like only work fine for serial jobs, not PE
>> jobs, right?
>
> You mean using more than one GPU at a time, or using parallel processes as 
> usual in addition to the GPU?
>


Thanks, Reuti!
Right, using parallel processes running on GPUs(on the same node, and
also cross multiple nodes).

Best

>
>> Our cluster has different nodes, some nodes each has 2 GPUs, some
>> others each has 4 GPUs. Most of the user jobs are PE jobs, some are
>> serial.
>>
>> The socket solution can event work for PE jobs, but as my
>> understanding, it is not efficient? Since each node has, for example,
>> 4 queues. If one user submit a PE job to a queue, he/she can not use
>> the other GPUs on the other queues?
>
> In SGE you don't submit to queues. You request resources. In case you want a 
> GPU job going to a set of queues the best way would be to attach a boolean 
> complex to these queues and submit the job with a request for this complex. 
> SGE is then free to elect any of the queues with this feature.
>
> Look at the link Gowtham posted for this.
>
> -- Reuti
>
>
>> On Mon, Apr 14, 2014 at 2:16 PM, Reuti <re...@staff.uni-marburg.de> wrote:
>>> Am 14.04.2014 um 20:06 schrieb Feng Zhang:
>>>
>>>> Thanks, Ian!
>>>>
>>>> I haven't checked the GPU load sensor in detail, either. It sounds to
>>>> me it only handles the number of GPU allocated to a job, but the job
>>>> doesn't know which GPUs it actually get and set the
>>>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can
>>>> be done by writing some scripts/programs, but to me, it is not an
>>>> accurate solution, since some jobs may still happen to collide to each
>>>> other on the same GPU on a multiple GPU node. If GE can have the
>>>> memory to record the GPUs allocated to a job, then this can be
>>>> perfect.
>>>
>>> Like the option to request sockets instead of cores which I posted in the 
>>> last couple of days, you can use a similar approach to get the number of 
>>> the granted GPU out of the queue name.
>>>
>>> -- Reuti
>>>
>>>
>>>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote:
>>>>> I believe there already is support for GPUs - there is a GPU Load
>>>>> Sensor in Open Grid Engine. You may have to build it yourself, I
>>>>> haven't checked to see if it comes pre-packaged.
>>>>>
>>>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at
>>>>> least has been working on it.
>>>>>
>>>>> Ian
>>>>>
>>>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Is there's any plan to implement the GPU resource management in SGE in
>>>>>> the near future? Like Slurm or Torque? There are some ways to do this
>>>>>> using scripts/programs, but I wonder that if the SGE itself can
>>>>>> recognize and manage GPU(and Phi). Not need to be complicated and
>>>>>> powerful, just do basic work.
>>>>>>
>>>>>> Thanks,
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@gridengine.org
>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ian Kaufman
>>>>> Research Systems Administrator
>>>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>>>> _______________________________________________
>>>> users mailing list
>>>> users@gridengine.org
>>>> https://gridengine.org/mailman/listinfo/users
>>>
>

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SGE and GPU

Reply via email to