> Am 04.05.2015 um 16:04 schrieb Stefano Bridi <[email protected]>: > > Ok, today is not a busy day for that queue so I had to recreate the > problem: by doing this I saw that while the queue is empty all works > as expected (for the seconds between the submit and the start of the > job it is displayed "qw" by the 'qstat -q E5m' as expected. > The E5m queue is built with 5 nodes: n010[4-8]. At the moment only one > is under real use so I need to submit 5 jobs to have one "qw". > > $ qsub sleeper.sh > Your job 876766 ("sleeper.sh") has been submitted > $ qsub sleeper.sh > Your job 876767 ("sleeper.sh") has been submitted > $ qsub sleeper.sh > Your job 876768 ("sleeper.sh") has been submitted > $ qsub sleeper.sh > Your job 876769 ("sleeper.sh") has been submitted > $ qsub sleeper.sh > Your job 876770 ("sleeper.sh") has been submitted > $ qalter -w v 876770 > Job 876770 cannot run in queue "opteron" because it is not contained > in its hard queue list (-q) > Job 876770 cannot run in queue "x5355" because it is not contained in > its hard queue list (-q) > Job 876770 cannot run in queue "e5645" because it is not contained in > its hard queue list (-q) > Job 876770 cannot run in queue "x5560" because it is not contained in > its hard queue list (-q) > Job 876770 cannot run in queue "x5670" because it is not contained in > its hard queue list (-q) > Job 876770 cannot run in queue "E5" because it is not contained in its > hard queue list (-q)
Although it's not the issue per se: did you create one queue for each type of machine? Maybe it's possible to combine them and request a STRING complex with the name of the architecture instead attached to queue instances by hostgroups or attached to the exechosts. -- Reuti > Job 876770 (-l exclusive=true) cannot run at host "n0104" because > exclusive resource (exclusive) is already in use > Job 876770 (-l exclusive=true) cannot run at host "n0105" because > exclusive resource (exclusive) is already in use > Job 876770 (-l exclusive=true) cannot run at host "n0106" because > exclusive resource (exclusive) is already in use > Job 876770 (-l exclusive=true) cannot run at host "n0107" because > exclusive resource (exclusive) is already in use > Job 876770 (-l exclusive=true) cannot run at host "n0108" because > exclusive resource (exclusive) is already in use > verification: no suitable queues > $ > > Does this mean that the "exclusive" complex requested via the "qsub > -l excl=true" is evaluated on the node before the check on the hard > queue list? If I am correct, is there another way to have both 'qstat > -q' and exclusive use of nodes working? > > thanks > stefano > > On Mon, May 4, 2015 at 1:45 PM, Reuti <[email protected]> wrote: >> Hi, >> >>> Am 04.05.2015 um 13:25 schrieb Stefano Bridi <[email protected]>: >>> >>> Hi all, >>> I need to give the possibility to the user to reserve one or more node >>> for exclusive use for their runs. >>> It is a mixed environment and If they don't reserve the node for >>> exclusive use, the serial and low number of core jobs will fragment >>> the availability of cores across many nodes. >>> The problem is that now the "exclusive" jobs are not listed anymore in >>> the "per queue" qstat: >>> >>> We solved the exclusive request by setting up a new complex: >>> >>> # qconf -sc excl >>> #name shortcut type relop requestable >>> consumable default urgency >>> #-------------------------------------------------------------------------------------------------- >>> exclusive excl BOOL EXCL YES >>> YES 0 1000 >>> >>> and setting on every node usable in this way the relative complex (is >>> there a way to set this system wide?): >>> >>> #qconf -se n0108 >>> hostname n0108 >>> load_scaling NONE >>> complex_values exclusive=true >>> load_values arch=linux-x64,num_proc=20,....[snip] >>> processors 20 >>> user_lists NONE >>> xuser_lists NONE >>> projects NONE >>> xprojects NONE >>> usage_scaling NONE >>> report_variables NONE >>> >>> now it I submit a job like: >>> $ cat sleeper.sh >>> #!/bin/bash >>> >>> # >>> #$ -cwd >>> #$ -j y >>> #$ -q E5m >>> #$ -S /bin/bash >>> #$ -l excl=true >>> # >>> date >>> sleep 20 >>> date >>> >>> $ >>> All works as expected except qstat: >>> a generic 'qstat' report: >>> job-ID prior name user state submit/start at >>> queue slots ja-task-ID >>> ----------------------------------------------------------------------------------------------------------------- >>> 876735 0.50601 sleeper.sh s.bridi qw 05/04/2015 12:20:45 >>> 1 >>> >>> and the 'qstat -j 876735' report: >>> ============================================================== >>> job_number: 876735 >>> exec_file: job_scripts/876735 >>> submission_time: Mon May 4 12:20:45 2015 >>> owner: s.bridi >>> uid: 65535 >>> group: domusers >>> gid: 15000 >>> sge_o_home: /home/s.bridi >>> sge_o_log_name: s.bridi >>> sge_o_path: >>> /sw/openmpi/142/bin:.:/ge/bin/linux-x64:/usr/lib64/qt-3.3/bin:/ge/bin/linux-x64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/s.bridi/bin >>> sge_o_shell: /bin/bash >>> sge_o_workdir: /home/s.bridi/testexcl >>> sge_o_host: login0 >>> account: sge >>> cwd: /home/s.bridi/testexcl >>> merge: y >>> hard resource_list: exclusive=true >>> mail_list: s.bridi@login0 >>> notify: FALSE >>> job_name: sleeper.sh >>> jobshare: 0 >>> hard_queue_list: E5m >>> shell_list: NONE:/bin/bash >>> env_list: >>> script_file: sleeper.sh >>> scheduling info: [snip] >>> >>> while the >>> 'qstat -q E5m' don't list the job! >> >> Usually this means that the job is not allowed to run in this queue. >> >> What does: >> >> $ qalter -w v 876735 >> >> ouput? >> >> -- Reuti >> >> >>> Thanks >>> Stefano >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
