Re: [gridengine users] GPU integration in SGE

Rayson Ho Thu, 17 May 2012 09:20:26 -0700

On Tue, May 15, 2012 at 6:11 AM, Semi <[email protected]> wrote:
> Can you give me more detailed answer and correct my definitions.

Hi Semi,

I was away for the past 2 days. Please always cc the list when you are
replying (I guess Reuti, Ron, and I always suggest people to do that -
there are many ways to configure Grid Engine, and others may see
something that we don't see, and it is uaually better to get feedback
from more people).

On the other hand, if you really need  you might consider support (
http://www.scalablelogic.com/scalable-grid-engine-support ). There is
always someone who can respond to your questions even when I am away.

> qconf -sc|grep gpu
> gpu                 gpu          INT         <=    YES         YES
> 0        0
>
> qconf -me sge135
> hostname              sge135
> load_scaling          NONE
> complex_values        gpu=2
>
> qconf -mconf sge135
> sge135:
> mailer                       /bin/mail
> xterm                        /usr/bin/X11/xterm
> qlogin_daemon                /usr/sbin/in.telnetd
> rlogin_daemon                /usr/sbin/in.rlogind
> load_sensor                  /storage/SGE6U8/gpu-load-sensor/cuda_sensor

Note that if you statically define a host to have 2 GPUs, then you
don't need to use the cuda_sensor. The GPU load sensor distributed by
the Open Grid Scheduler project (which you can find in other Grid
Engine implementations) is very similar to Bright Computing's GPU
Management in the Bright Cluster Manager:

http://www.brightcomputing.com/NVIDIA-GPU-Cluster-Management-Monitoring.php

We both monitor temperature, fan speed, voltage, ECC, etc. When we
started the GPU load sensor development we didn't know that Bright had
something similar...

>From a scheduling point of view, you can ignore most of that. Some
sites like to bias node priority based on GPU temperature, and in some
cases if the ECC error is real bad then the GPU should not be used for
GPU jobs.

>
> qsub -l gpu=1 test.sh
>
> And if I need parallel run on GPU. What I have to do? How define pe for GPU?

You just use "qsub -l gpu=2" if you want to use 2 GPUs for that job.

Rayson

>
>
> On 5/14/2012 2:51 PM, Rayson Ho wrote:
>
> Just get the load sensor from:
>
> https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c
>
> Compile it on your system - and make sure that it has the CUDA SDK &
> libraries installed (Google is your friend - look for the nvidia-ml
> library).
>
> % cc gpu_sensor.c -lnvidia-ml
>
> Before you use it as a load sensor, compile and run it interactively:
>
> % cc gpu_sensor.c -DSTANDALONE -lnvidia-ml
>
> Make sure that the code is reporting something meaningful on your system.
>
> Rayson
>
>
>
> On Mon, May 14, 2012 at 4:55 AM, Semi <[email protected]> wrote:
>
> Please help in GPU integration under SGE and parallel running of NAMD and
> GAMESS on GPU via SGE.
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] GPU integration in SGE

Reply via email to