Am 05.12.2015 um 15:55 schrieb Rajil Saraswat: > On 4 December 2015 at 21:07, Feng Zhang <[email protected]> wrote: >> Did you use "-l gpu=1" in your job script? >> >> -- >> Best, >> >> Feng > > I am not having very much luck with this. The job is getting stuck > #qalter -w p 412 > Job 412 cannot run in queue "all.q" because it is not contained in its > hard queue list (-q) > Job 412 cannot run in PE "mpi" because it only offers 2 slots > verification: no suitable queues > > The job script looks like this > > #!/bin/csh > #$ -V > #$ -S /bin/csh > #$ -N j1 > #$ -q "gpu.q"
No quotation marks are necessary here. But you also need to avoid that another job will run in the gpu.q, as the gpu complex is not forced. But you can't set it to forced, as then it would also be necessary to have it set to be allowed to run on any host. A JSV (job submission verifier) could help: gpu requested => set queue request to qpu.q, otherwise set it to all.q or whatever names of other queue you have in the system. But: why do you want to have a dedicated queue for gpu jobs at all - are the limits therein different from the all.q? > #$ -l gpu=1 > #$ -m beas > #$ -j y -o /home/rajil/tmp/tst/j1.qlog > #$ -pe mpi 8 > abaqus python /share/apps/abaqus/6.14-2/../abaJobHandler.py j1 > /home/rajil/tmp/tst j1.fs.131566 0 j1.com model.inp > > > PE mpi is defined as > > #qconf -sp mpi > pe_name mpi > slots 9999 > user_lists NONE > xuser_lists NONE > start_proc_args /opt/gridengine/mpi/startmpi.sh $pe_hostfile > stop_proc_args /opt/gridengine/mpi/stopmpi.sh > allocation_rule $fill_up > control_slaves FALSE The above should be set to TRUE to achieve a tight integration. > job_is_first_task TRUE > urgency_slots min > accounting_summary TRUE > > > Each node has 32 cpus and 1 gpu. Why is pe_slots being limited to 2 > even when 64 cpus are available? They could be reserved for another waiting job. Is the PE mpi attached to both queues? -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
