Hi Rayson!
Can I use this method for GPU definition? It's more clear for me.
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2011-August/054479.html
On 5/17/2012 7:09 PM, Rayson Ho wrote:
On Tue, May 15, 2012 at 6:11 AM, Semi<[email protected]> wrote:
Can you give me more detailed answer and correct my definitions.
Hi Semi,
I was away for the past 2 days. Please always cc the list when you are
replying (I guess Reuti, Ron, and I always suggest people to do that -
there are many ways to configure Grid Engine, and others may see
something that we don't see, and it is uaually better to get feedback
from more people).
On the other hand, if you really need you might consider support (
http://www.scalablelogic.com/scalable-grid-engine-support ). There is
always someone who can respond to your questions even when I am away.
qconf -sc|grep gpu
gpu gpu INT<= YES YES
0 0
qconf -me sge135
hostname sge135
load_scaling NONE
complex_values gpu=2
qconf -mconf sge135
sge135:
mailer /bin/mail
xterm /usr/bin/X11/xterm
qlogin_daemon /usr/sbin/in.telnetd
rlogin_daemon /usr/sbin/in.rlogind
load_sensor /storage/SGE6U8/gpu-load-sensor/cuda_sensor
Note that if you statically define a host to have 2 GPUs, then you
don't need to use the cuda_sensor. The GPU load sensor distributed by
the Open Grid Scheduler project (which you can find in other Grid
Engine implementations) is very similar to Bright Computing's GPU
Management in the Bright Cluster Manager:
http://www.brightcomputing.com/NVIDIA-GPU-Cluster-Management-Monitoring.php
We both monitor temperature, fan speed, voltage, ECC, etc. When we
started the GPU load sensor development we didn't know that Bright had
something similar...
From a scheduling point of view, you can ignore most of that. Some
sites like to bias node priority based on GPU temperature, and in some
cases if the ECC error is real bad then the GPU should not be used for
GPU jobs.
qsub -l gpu=1 test.sh
And if I need parallel run on GPU. What I have to do? How define pe for GPU?
You just use "qsub -l gpu=2" if you want to use 2 GPUs for that job.
Rayson
On 5/14/2012 2:51 PM, Rayson Ho wrote:
Just get the load sensor from:
https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c
Compile it on your system - and make sure that it has the CUDA SDK&
libraries installed (Google is your friend - look for the nvidia-ml
library).
% cc gpu_sensor.c -lnvidia-ml
Before you use it as a load sensor, compile and run it interactively:
% cc gpu_sensor.c -DSTANDALONE -lnvidia-ml
Make sure that the code is reporting something meaningful on your system.
Rayson
On Mon, May 14, 2012 at 4:55 AM, Semi<[email protected]> wrote:
Please help in GPU integration under SGE and parallel running of NAMD and
GAMESS on GPU via SGE.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users