Am 20.05.2012 um 13:24 schrieb Semi:

> Please correct me, if I understand right your proposal for definitions and 
> usage:
> qconf -sc|grep gpu
> gpu                 gpu          INT         <=    YES         YES 0        0
> 
> qconf -me sge135
> hostname              sge135
> load_scaling          NONE
> complex_values        gpu=2
> 
> qsub -l gpu=2 test.sh
> 
> load_sensor is not needed.

Yes, this is fine and preferred IMO. The ROCKS link uses only a BOOL complex 
and put the amount in the queue definition instead, i.e. it can't be shared 
across several queues.

-- Reuti


> 
> On 5/20/2012 12:44 PM, Reuti wrote:
>> Am 20.05.2012 um 10:21 schrieb Semi:
>> 
>> 
>>> Hi Rayson!
>>> 
>>> Can I use this method for GPU definition?  It's more clear for me.
>>> 
>>> 
>>> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2011-August/054479.html
>> Requesting a CUDA complex and a CUDA queue looks redundant. And it's also 
>> only for a single GPU per machine AFAICS.What's wrong with Rayson's setup, 
>> it's just a different type of complex?
>> 
>> -- Reuti
>> 
>> 
>> 
>>> On 5/17/2012 7:09 PM, Rayson Ho wrote:
>>> 
>>>> On Tue, May 15, 2012 at 6:11 AM, Semi<[email protected]>
>>>>   wrote:
>>>> 
>>>>> Can you give me more detailed answer and correct my definitions.
>>>>> 
>>>> Hi Semi,
>>>> 
>>>> I was away for the past 2 days. Please always cc the list when you are
>>>> replying (I guess Reuti, Ron, and I always suggest people to do that -
>>>> there are many ways to configure Grid Engine, and others may see
>>>> something that we don't see, and it is uaually better to get feedback
>>>> from more people).
>>>> 
>>>> On the other hand, if you really need  you might consider support (
>>>> 
>>>> http://www.scalablelogic.com/scalable-grid-engine-support
>>>>  ). There is
>>>> always someone who can respond to your questions even when I am away.
>>>> 
>>>> 
>>>> 
>>>>> qconf -sc|grep gpu
>>>>> gpu                 gpu          INT<=    YES         YES
>>>>> 0        0
>>>>> 
>>>>> qconf -me sge135
>>>>> hostname              sge135
>>>>> load_scaling          NONE
>>>>> complex_values        gpu=2
>>>>> 
>>>>> qconf -mconf sge135
>>>>> sge135:
>>>>> mailer                       /bin/mail
>>>>> xterm                        /usr/bin/X11/xterm
>>>>> qlogin_daemon                /usr/sbin/in.telnetd
>>>>> rlogin_daemon                /usr/sbin/in.rlogind
>>>>> load_sensor                  /storage/SGE6U8/gpu-load-sensor/cuda_sensor
>>>>> 
>>>> Note that if you statically define a host to have 2 GPUs, then you
>>>> don't need to use the cuda_sensor. The GPU load sensor distributed by
>>>> the Open Grid Scheduler project (which you can find in other Grid
>>>> Engine implementations) is very similar to Bright Computing's GPU
>>>> Management in the Bright Cluster Manager:
>>>> 
>>>> 
>>>> http://www.brightcomputing.com/NVIDIA-GPU-Cluster-Management-Monitoring.php
>>>> 
>>>> 
>>>> We both monitor temperature, fan speed, voltage, ECC, etc. When we
>>>> started the GPU load sensor development we didn't know that Bright had
>>>> something similar...
>>>> 
>>>> From a scheduling point of view, you can ignore most of that. Some
>>>> sites like to bias node priority based on GPU temperature, and in some
>>>> cases if the ECC error is real bad then the GPU should not be used for
>>>> GPU jobs.
>>>> 
>>>> 
>>>> 
>>>>> qsub -l gpu=1 test.sh
>>>>> 
>>>>> And if I need parallel run on GPU. What I have to do? How define pe for 
>>>>> GPU?
>>>>> 
>>>> You just use "qsub -l gpu=2" if you want to use 2 GPUs for that job.
>>>> 
>>>> Rayson
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 5/14/2012 2:51 PM, Rayson Ho wrote:
>>>>> 
>>>>> Just get the load sensor from:
>>>>> 
>>>>> 
>>>>> https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c
>>>>> 
>>>>> 
>>>>> Compile it on your system - and make sure that it has the CUDA SDK&
>>>>> libraries installed (Google is your friend - look for the nvidia-ml
>>>>> library).
>>>>> 
>>>>> % cc gpu_sensor.c -lnvidia-ml
>>>>> 
>>>>> Before you use it as a load sensor, compile and run it interactively:
>>>>> 
>>>>> % cc gpu_sensor.c -DSTANDALONE -lnvidia-ml
>>>>> 
>>>>> Make sure that the code is reporting something meaningful on your system.
>>>>> 
>>>>> Rayson
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, May 14, 2012 at 4:55 AM, Semi
>>>>> <[email protected]>
>>>>>   wrote:
>>>>> 
>>>>> Please help in GPU integration under SGE and parallel running of NAMD and
>>>>> GAMESS on GPU via SGE.
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> 
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>> _______________________________________________
>>> users mailing list
>>> 
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to