On Mon, Apr 14, 2014 at 10:42 PM, Feng Zhang <[email protected]> wrote:
> On Mon, Apr 14, 2014 at 5:36 PM, Reuti <[email protected]> wrote:
>> Am 14.04.2014 um 20:57 schrieb Feng Zhang:
>>
>>> Thanks, Reuti,
>>>
>>> The socket solution looks like only work fine for serial jobs, not PE
>>> jobs, right?
>>
>> You mean using more than one GPU at a time, or using parallel processes as 
>> usual in addition to the GPU?
>>
>
> Thanks, Reuti!
> Right, using parallel processes running on GPUs(on the same node, and
> also cross multiple nodes).

For example, GE+OPENMPI

>
> Best
>
>>
>>> Our cluster has different nodes, some nodes each has 2 GPUs, some
>>> others each has 4 GPUs. Most of the user jobs are PE jobs, some are
>>> serial.
>>>
>>> The socket solution can event work for PE jobs, but as my
>>> understanding, it is not efficient? Since each node has, for example,
>>> 4 queues. If one user submit a PE job to a queue, he/she can not use
>>> the other GPUs on the other queues?
>>
>> In SGE you don't submit to queues. You request resources. In case you want a 
>> GPU job going to a set of queues the best way would be to attach a boolean 
>> complex to these queues and submit the job with a request for this complex. 
>> SGE is then free to elect any of the queues with this feature.
>>
>> Look at the link Gowtham posted for this.
>>
>> -- Reuti
>>
>>
>>> On Mon, Apr 14, 2014 at 2:16 PM, Reuti <[email protected]> wrote:
>>>> Am 14.04.2014 um 20:06 schrieb Feng Zhang:
>>>>
>>>>> Thanks, Ian!
>>>>>
>>>>> I haven't checked the GPU load sensor in detail, either. It sounds to
>>>>> me it only handles the number of GPU allocated to a job, but the job
>>>>> doesn't know which GPUs it actually get and set the
>>>>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can
>>>>> be done by writing some scripts/programs, but to me, it is not an
>>>>> accurate solution, since some jobs may still happen to collide to each
>>>>> other on the same GPU on a multiple GPU node. If GE can have the
>>>>> memory to record the GPUs allocated to a job, then this can be
>>>>> perfect.
>>>>
>>>> Like the option to request sockets instead of cores which I posted in the 
>>>> last couple of days, you can use a similar approach to get the number of 
>>>> the granted GPU out of the queue name.
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <[email protected]> 
>>>>> wrote:
>>>>>> I believe there already is support for GPUs - there is a GPU Load
>>>>>> Sensor in Open Grid Engine. You may have to build it yourself, I
>>>>>> haven't checked to see if it comes pre-packaged.
>>>>>>
>>>>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at
>>>>>> least has been working on it.
>>>>>>
>>>>>> Ian
>>>>>>
>>>>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Is there's any plan to implement the GPU resource management in SGE in
>>>>>>> the near future? Like Slurm or Torque? There are some ways to do this
>>>>>>> using scripts/programs, but I wonder that if the SGE itself can
>>>>>>> recognize and manage GPU(and Phi). Not need to be complicated and
>>>>>>> powerful, just do basic work.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> [email protected]
>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ian Kaufman
>>>>>> Research Systems Administrator
>>>>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>>>
>>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to