On 5 December 2015 at 09:27, Reuti <[email protected]> wrote:
>> #$ -q "gpu.q"
>
> No quotation marks are necessary here.
>
> But you also need to avoid that another job will run in the gpu.q, as the gpu 
> complex is not forced. But you can't set it to forced, as then it would also 
> be necessary to have it set to be allowed to run on any host.
>
> A JSV (job submission verifier) could help: gpu requested => set queue 
> request to qpu.q, otherwise set it to all.q or whatever names of other queue 
> you have in the system.
>
> But: why do you want to have a dedicated queue for gpu jobs at all - are the 
> limits therein different from the all.q?
>

I have a separate queue (all.q) with only cpu hosts and this (qpu.q)
with only gpu hosts. I am taking cues from this post
http://serverfault.com/questions/322073/howto-set-up-sge-for-cuda-devices

>
>> #$ -l gpu=1
>> #$ -m beas
>> #$ -j y -o /home/rajil/tmp/tst/j1.qlog
>> #$ -pe mpi 8
>> abaqus python /share/apps/abaqus/6.14-2/../abaJobHandler.py j1
>> /home/rajil/tmp/tst j1.fs.131566 0 j1.com model.inp
>>
>>
>> PE mpi is defined as
>>
>> #qconf -sp mpi
>> pe_name mpi
>> slots 9999
>> user_lists NONE
>> xuser_lists NONE
>> start_proc_args /opt/gridengine/mpi/startmpi.sh $pe_hostfile
>> stop_proc_args /opt/gridengine/mpi/stopmpi.sh
>> allocation_rule $fill_up
>> control_slaves FALSE
>
> The above should be set to TRUE to achieve a tight integration.

ok, i was using the default rocks template
/opt/gridengine/mpi/rocks-mpi.template which defines it like this.
>
>> job_is_first_task TRUE
>> urgency_slots min
>> accounting_summary TRUE
>>
>>
>> Each node has 32 cpus and 1 gpu. Why is pe_slots being limited to 2
>> even when 64 cpus are available?
>
> They could be reserved for another waiting job. Is the PE mpi attached to 
> both queues?
>
> -- Reuti

Yes, PE mpi is also used in the other 'all.q'. However as i mentioned
both queues have separate hosts defined. Could this still be an issue?
There is no other waiting job in the queue although there are some
running in 'all.q'.

-Rajil
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to