Please correct me, if I understand right your proposal for definitions and usage:

qconf -sc|grep gpu
gpu                 gpu          INT<=    YES         YES 0        0

qconf -me sge135
hostname              sge135
load_scaling          NONE
complex_values        gpu=2

qsub -l gpu=2 test.sh

load_sensor is not needed.


On 5/20/2012 12:44 PM, Reuti wrote:
Am 20.05.2012 um 10:21 schrieb Semi:

Hi Rayson!

Can I use this method for GPU definition?  It's more clear for me.

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2011-August/054479.html
Requesting a CUDA complex and a CUDA queue looks redundant. And it's also only 
for a single GPU per machine AFAICS.What's wrong with Rayson's setup, it's just 
a different type of complex?

-- Reuti


On 5/17/2012 7:09 PM, Rayson Ho wrote:
On Tue, May 15, 2012 at 6:11 AM, Semi<[email protected]>   wrote:
Can you give me more detailed answer and correct my definitions.
Hi Semi,

I was away for the past 2 days. Please always cc the list when you are
replying (I guess Reuti, Ron, and I always suggest people to do that -
there are many ways to configure Grid Engine, and others may see
something that we don't see, and it is uaually better to get feedback
from more people).

On the other hand, if you really need  you might consider support (
http://www.scalablelogic.com/scalable-grid-engine-support ). There is
always someone who can respond to your questions even when I am away.


qconf -sc|grep gpu
gpu                 gpu          INT<=    YES         YES
0        0

qconf -me sge135
hostname              sge135
load_scaling          NONE
complex_values        gpu=2

qconf -mconf sge135
sge135:
mailer                       /bin/mail
xterm                        /usr/bin/X11/xterm
qlogin_daemon                /usr/sbin/in.telnetd
rlogin_daemon                /usr/sbin/in.rlogind
load_sensor                  /storage/SGE6U8/gpu-load-sensor/cuda_sensor
Note that if you statically define a host to have 2 GPUs, then you
don't need to use the cuda_sensor. The GPU load sensor distributed by
the Open Grid Scheduler project (which you can find in other Grid
Engine implementations) is very similar to Bright Computing's GPU
Management in the Bright Cluster Manager:

http://www.brightcomputing.com/NVIDIA-GPU-Cluster-Management-Monitoring.php

We both monitor temperature, fan speed, voltage, ECC, etc. When we
started the GPU load sensor development we didn't know that Bright had
something similar...

 From a scheduling point of view, you can ignore most of that. Some
sites like to bias node priority based on GPU temperature, and in some
cases if the ECC error is real bad then the GPU should not be used for
GPU jobs.


qsub -l gpu=1 test.sh

And if I need parallel run on GPU. What I have to do? How define pe for GPU?
You just use "qsub -l gpu=2" if you want to use 2 GPUs for that job.

Rayson



On 5/14/2012 2:51 PM, Rayson Ho wrote:

Just get the load sensor from:

https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c

Compile it on your system - and make sure that it has the CUDA SDK&
libraries installed (Google is your friend - look for the nvidia-ml
library).

% cc gpu_sensor.c -lnvidia-ml

Before you use it as a load sensor, compile and run it interactively:

% cc gpu_sensor.c -DSTANDALONE -lnvidia-ml

Make sure that the code is reporting something meaningful on your system.

Rayson



On Mon, May 14, 2012 at 4:55 AM, Semi<[email protected]>   wrote:

Please help in GPU integration under SGE and parallel running of NAMD and
GAMESS on GPU via SGE.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to