Am 20.05.2012 um 10:21 schrieb Semi: > Hi Rayson! > > Can I use this method for GPU definition? It's more clear for me. > > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2011-August/054479.html
Requesting a CUDA complex and a CUDA queue looks redundant. And it's also only for a single GPU per machine AFAICS.What's wrong with Rayson's setup, it's just a different type of complex? -- Reuti > On 5/17/2012 7:09 PM, Rayson Ho wrote: >> On Tue, May 15, 2012 at 6:11 AM, Semi<[email protected]> wrote: >>> Can you give me more detailed answer and correct my definitions. >> Hi Semi, >> >> I was away for the past 2 days. Please always cc the list when you are >> replying (I guess Reuti, Ron, and I always suggest people to do that - >> there are many ways to configure Grid Engine, and others may see >> something that we don't see, and it is uaually better to get feedback >> from more people). >> >> On the other hand, if you really need you might consider support ( >> http://www.scalablelogic.com/scalable-grid-engine-support ). There is >> always someone who can respond to your questions even when I am away. >> >> >>> qconf -sc|grep gpu >>> gpu gpu INT<= YES YES >>> 0 0 >>> >>> qconf -me sge135 >>> hostname sge135 >>> load_scaling NONE >>> complex_values gpu=2 >>> >>> qconf -mconf sge135 >>> sge135: >>> mailer /bin/mail >>> xterm /usr/bin/X11/xterm >>> qlogin_daemon /usr/sbin/in.telnetd >>> rlogin_daemon /usr/sbin/in.rlogind >>> load_sensor /storage/SGE6U8/gpu-load-sensor/cuda_sensor >> Note that if you statically define a host to have 2 GPUs, then you >> don't need to use the cuda_sensor. The GPU load sensor distributed by >> the Open Grid Scheduler project (which you can find in other Grid >> Engine implementations) is very similar to Bright Computing's GPU >> Management in the Bright Cluster Manager: >> >> http://www.brightcomputing.com/NVIDIA-GPU-Cluster-Management-Monitoring.php >> >> We both monitor temperature, fan speed, voltage, ECC, etc. When we >> started the GPU load sensor development we didn't know that Bright had >> something similar... >> >> From a scheduling point of view, you can ignore most of that. Some >> sites like to bias node priority based on GPU temperature, and in some >> cases if the ECC error is real bad then the GPU should not be used for >> GPU jobs. >> >> >>> qsub -l gpu=1 test.sh >>> >>> And if I need parallel run on GPU. What I have to do? How define pe for GPU? >> You just use "qsub -l gpu=2" if you want to use 2 GPUs for that job. >> >> Rayson >> >> >> >>> >>> On 5/14/2012 2:51 PM, Rayson Ho wrote: >>> >>> Just get the load sensor from: >>> >>> https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c >>> >>> Compile it on your system - and make sure that it has the CUDA SDK& >>> libraries installed (Google is your friend - look for the nvidia-ml >>> library). >>> >>> % cc gpu_sensor.c -lnvidia-ml >>> >>> Before you use it as a load sensor, compile and run it interactively: >>> >>> % cc gpu_sensor.c -DSTANDALONE -lnvidia-ml >>> >>> Make sure that the code is reporting something meaningful on your system. >>> >>> Rayson >>> >>> >>> >>> On Mon, May 14, 2012 at 4:55 AM, Semi<[email protected]> wrote: >>> >>> Please help in GPU integration under SGE and parallel running of NAMD and >>> GAMESS on GPU via SGE. >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
