Re: [gridengine users] GPU integration in SGE

Reuti Sun, 20 May 2012 02:48:05 -0700

Am 20.05.2012 um 10:21 schrieb Semi:

> Hi Rayson!
> 
> Can I use this method for GPU definition?  It's more clear for me.
> 
> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2011-August/054479.html


Requesting a CUDA complex and a CUDA queue looks redundant. And it's also only 
for a single GPU per machine AFAICS.What's wrong with Rayson's setup, it's just 
a different type of complex?

-- Reuti


> On 5/17/2012 7:09 PM, Rayson Ho wrote:
>> On Tue, May 15, 2012 at 6:11 AM, Semi<[email protected]>  wrote:
>>> Can you give me more detailed answer and correct my definitions.
>> Hi Semi,
>> 
>> I was away for the past 2 days. Please always cc the list when you are
>> replying (I guess Reuti, Ron, and I always suggest people to do that -
>> there are many ways to configure Grid Engine, and others may see
>> something that we don't see, and it is uaually better to get feedback
>> from more people).
>> 
>> On the other hand, if you really need  you might consider support (
>> http://www.scalablelogic.com/scalable-grid-engine-support ). There is
>> always someone who can respond to your questions even when I am away.
>> 
>> 
>>> qconf -sc|grep gpu
>>> gpu                 gpu          INT<=    YES         YES
>>> 0        0
>>> 
>>> qconf -me sge135
>>> hostname              sge135
>>> load_scaling          NONE
>>> complex_values        gpu=2
>>> 
>>> qconf -mconf sge135
>>> sge135:
>>> mailer                       /bin/mail
>>> xterm                        /usr/bin/X11/xterm
>>> qlogin_daemon                /usr/sbin/in.telnetd
>>> rlogin_daemon                /usr/sbin/in.rlogind
>>> load_sensor                  /storage/SGE6U8/gpu-load-sensor/cuda_sensor
>> Note that if you statically define a host to have 2 GPUs, then you
>> don't need to use the cuda_sensor. The GPU load sensor distributed by
>> the Open Grid Scheduler project (which you can find in other Grid
>> Engine implementations) is very similar to Bright Computing's GPU
>> Management in the Bright Cluster Manager:
>> 
>> http://www.brightcomputing.com/NVIDIA-GPU-Cluster-Management-Monitoring.php
>> 
>> We both monitor temperature, fan speed, voltage, ECC, etc. When we
>> started the GPU load sensor development we didn't know that Bright had
>> something similar...
>> 
>> From a scheduling point of view, you can ignore most of that. Some
>> sites like to bias node priority based on GPU temperature, and in some
>> cases if the ECC error is real bad then the GPU should not be used for
>> GPU jobs.
>> 
>> 
>>> qsub -l gpu=1 test.sh
>>> 
>>> And if I need parallel run on GPU. What I have to do? How define pe for GPU?
>> You just use "qsub -l gpu=2" if you want to use 2 GPUs for that job.
>> 
>> Rayson
>> 
>> 
>> 
>>> 
>>> On 5/14/2012 2:51 PM, Rayson Ho wrote:
>>> 
>>> Just get the load sensor from:
>>> 
>>> https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c
>>> 
>>> Compile it on your system - and make sure that it has the CUDA SDK&
>>> libraries installed (Google is your friend - look for the nvidia-ml
>>> library).
>>> 
>>> % cc gpu_sensor.c -lnvidia-ml
>>> 
>>> Before you use it as a load sensor, compile and run it interactively:
>>> 
>>> % cc gpu_sensor.c -DSTANDALONE -lnvidia-ml
>>> 
>>> Make sure that the code is reporting something meaningful on your system.
>>> 
>>> Rayson
>>> 
>>> 
>>> 
>>> On Mon, May 14, 2012 at 4:55 AM, Semi<[email protected]>  wrote:
>>> 
>>> Please help in GPU integration under SGE and parallel running of NAMD and
>>> GAMESS on GPU via SGE.
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] GPU integration in SGE

Reply via email to