Am 20.05.2012 um 13:24 schrieb Semi: > Please correct me, if I understand right your proposal for definitions and > usage: > qconf -sc|grep gpu > gpu gpu INT <= YES YES 0 0 > > qconf -me sge135 > hostname sge135 > load_scaling NONE > complex_values gpu=2 > > qsub -l gpu=2 test.sh > > load_sensor is not needed.
Yes, this is fine and preferred IMO. The ROCKS link uses only a BOOL complex and put the amount in the queue definition instead, i.e. it can't be shared across several queues. -- Reuti > > On 5/20/2012 12:44 PM, Reuti wrote: >> Am 20.05.2012 um 10:21 schrieb Semi: >> >> >>> Hi Rayson! >>> >>> Can I use this method for GPU definition? It's more clear for me. >>> >>> >>> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2011-August/054479.html >> Requesting a CUDA complex and a CUDA queue looks redundant. And it's also >> only for a single GPU per machine AFAICS.What's wrong with Rayson's setup, >> it's just a different type of complex? >> >> -- Reuti >> >> >> >>> On 5/17/2012 7:09 PM, Rayson Ho wrote: >>> >>>> On Tue, May 15, 2012 at 6:11 AM, Semi<[email protected]> >>>> wrote: >>>> >>>>> Can you give me more detailed answer and correct my definitions. >>>>> >>>> Hi Semi, >>>> >>>> I was away for the past 2 days. Please always cc the list when you are >>>> replying (I guess Reuti, Ron, and I always suggest people to do that - >>>> there are many ways to configure Grid Engine, and others may see >>>> something that we don't see, and it is uaually better to get feedback >>>> from more people). >>>> >>>> On the other hand, if you really need you might consider support ( >>>> >>>> http://www.scalablelogic.com/scalable-grid-engine-support >>>> ). There is >>>> always someone who can respond to your questions even when I am away. >>>> >>>> >>>> >>>>> qconf -sc|grep gpu >>>>> gpu gpu INT<= YES YES >>>>> 0 0 >>>>> >>>>> qconf -me sge135 >>>>> hostname sge135 >>>>> load_scaling NONE >>>>> complex_values gpu=2 >>>>> >>>>> qconf -mconf sge135 >>>>> sge135: >>>>> mailer /bin/mail >>>>> xterm /usr/bin/X11/xterm >>>>> qlogin_daemon /usr/sbin/in.telnetd >>>>> rlogin_daemon /usr/sbin/in.rlogind >>>>> load_sensor /storage/SGE6U8/gpu-load-sensor/cuda_sensor >>>>> >>>> Note that if you statically define a host to have 2 GPUs, then you >>>> don't need to use the cuda_sensor. The GPU load sensor distributed by >>>> the Open Grid Scheduler project (which you can find in other Grid >>>> Engine implementations) is very similar to Bright Computing's GPU >>>> Management in the Bright Cluster Manager: >>>> >>>> >>>> http://www.brightcomputing.com/NVIDIA-GPU-Cluster-Management-Monitoring.php >>>> >>>> >>>> We both monitor temperature, fan speed, voltage, ECC, etc. When we >>>> started the GPU load sensor development we didn't know that Bright had >>>> something similar... >>>> >>>> From a scheduling point of view, you can ignore most of that. Some >>>> sites like to bias node priority based on GPU temperature, and in some >>>> cases if the ECC error is real bad then the GPU should not be used for >>>> GPU jobs. >>>> >>>> >>>> >>>>> qsub -l gpu=1 test.sh >>>>> >>>>> And if I need parallel run on GPU. What I have to do? How define pe for >>>>> GPU? >>>>> >>>> You just use "qsub -l gpu=2" if you want to use 2 GPUs for that job. >>>> >>>> Rayson >>>> >>>> >>>> >>>> >>>>> On 5/14/2012 2:51 PM, Rayson Ho wrote: >>>>> >>>>> Just get the load sensor from: >>>>> >>>>> >>>>> https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c >>>>> >>>>> >>>>> Compile it on your system - and make sure that it has the CUDA SDK& >>>>> libraries installed (Google is your friend - look for the nvidia-ml >>>>> library). >>>>> >>>>> % cc gpu_sensor.c -lnvidia-ml >>>>> >>>>> Before you use it as a load sensor, compile and run it interactively: >>>>> >>>>> % cc gpu_sensor.c -DSTANDALONE -lnvidia-ml >>>>> >>>>> Make sure that the code is reporting something meaningful on your system. >>>>> >>>>> Rayson >>>>> >>>>> >>>>> >>>>> On Mon, May 14, 2012 at 4:55 AM, Semi >>>>> <[email protected]> >>>>> wrote: >>>>> >>>>> Please help in GPU integration under SGE and parallel running of NAMD and >>>>> GAMESS on GPU via SGE. >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>> _______________________________________________ >>> users mailing list >>> >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
