Ok, sorry, yesterday I miss to reply to the list.
Today is not a busy day for that queue so I had to recreate the
problem: by doing this I saw that while the queue is empty all works
as expected (for the seconds between the submit and the start of the
job it is displayed "qw" by the 'qstat -q E5m' as expected.
The E5m queue is built with 5 nodes: n010[4-8]. At the moment only one
is under real use so I need to submit 5 jobs to have one "qw".
$ qsub sleeper.sh
Your job 876766 ("sleeper.sh") has been submitted
$ qsub sleeper.sh
Your job 876767 ("sleeper.sh") has been submitted
$ qsub sleeper.sh
Your job 876768 ("sleeper.sh") has been submitted
$ qsub sleeper.sh
Your job 876769 ("sleeper.sh") has been submitted
$ qsub sleeper.sh
Your job 876770 ("sleeper.sh") has been submitted
$ qalter -w v 876770
Job 876770 cannot run in queue "opteron" because it is not contained
in its hard queue list (-q)
Job 876770 cannot run in queue "x5355" because it is not contained in
its hard queue list (-q)
Job 876770 cannot run in queue "e5645" because it is not contained in
its hard queue list (-q)
Job 876770 cannot run in queue "x5560" because it is not contained in
its hard queue list (-q)
Job 876770 cannot run in queue "x5670" because it is not contained in
its hard queue list (-q)
Job 876770 cannot run in queue "E5" because it is not contained in its
hard queue list (-q)
Job 876770 (-l exclusive=true) cannot run at host "n0104" because
exclusive resource (exclusive) is already in use
Job 876770 (-l exclusive=true) cannot run at host "n0105" because
exclusive resource (exclusive) is already in use
Job 876770 (-l exclusive=true) cannot run at host "n0106" because
exclusive resource (exclusive) is already in use
Job 876770 (-l exclusive=true) cannot run at host "n0107" because
exclusive resource (exclusive) is already in use
Job 876770 (-l exclusive=true) cannot run at host "n0108" because
exclusive resource (exclusive) is already in use
verification: no suitable queues
$
Does this mean that the "exclusive" complex requested via the "qsub
-l excl=true" is evaluated on the node before the check on the hard
queue list? If I am correct, is there another way to have both 'qstat
-q' and exclusive use of nodes working?
thanks
stefano
Il 04/mag/2015 13:46, "Reuti" <[email protected]> ha scritto:
> Hi,
>
> > Am 04.05.2015 um 13:25 schrieb Stefano Bridi <[email protected]>:
> >
> > Hi all,
> > I need to give the possibility to the user to reserve one or more node
> > for exclusive use for their runs.
> > It is a mixed environment and If they don't reserve the node for
> > exclusive use, the serial and low number of core jobs will fragment
> > the availability of cores across many nodes.
> > The problem is that now the "exclusive" jobs are not listed anymore in
> > the "per queue" qstat:
> >
> > We solved the exclusive request by setting up a new complex:
> >
> > # qconf -sc excl
> > #name shortcut type relop requestable
> > consumable default urgency
> >
> #--------------------------------------------------------------------------------------------------
> > exclusive excl BOOL EXCL YES
> > YES 0 1000
> >
> > and setting on every node usable in this way the relative complex (is
> > there a way to set this system wide?):
> >
> > #qconf -se n0108
> > hostname n0108
> > load_scaling NONE
> > complex_values exclusive=true
> > load_values arch=linux-x64,num_proc=20,....[snip]
> > processors 20
> > user_lists NONE
> > xuser_lists NONE
> > projects NONE
> > xprojects NONE
> > usage_scaling NONE
> > report_variables NONE
> >
> > now it I submit a job like:
> > $ cat sleeper.sh
> > #!/bin/bash
> >
> > #
> > #$ -cwd
> > #$ -j y
> > #$ -q E5m
> > #$ -S /bin/bash
> > #$ -l excl=true
> > #
> > date
> > sleep 20
> > date
> >
> > $
> > All works as expected except qstat:
> > a generic 'qstat' report:
> > job-ID prior name user state submit/start at
> > queue slots ja-task-ID
> >
> -----------------------------------------------------------------------------------------------------------------
> > 876735 0.50601 sleeper.sh s.bridi qw 05/04/2015 12:20:45
> > 1
> >
> > and the 'qstat -j 876735' report:
> > ==============================================================
> > job_number: 876735
> > exec_file: job_scripts/876735
> > submission_time: Mon May 4 12:20:45 2015
> > owner: s.bridi
> > uid: 65535
> > group: domusers
> > gid: 15000
> > sge_o_home: /home/s.bridi
> > sge_o_log_name: s.bridi
> > sge_o_path:
> >
> /sw/openmpi/142/bin:.:/ge/bin/linux-x64:/usr/lib64/qt-3.3/bin:/ge/bin/linux-x64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/s.bridi/bin
> > sge_o_shell: /bin/bash
> > sge_o_workdir: /home/s.bridi/testexcl
> > sge_o_host: login0
> > account: sge
> > cwd: /home/s.bridi/testexcl
> > merge: y
> > hard resource_list: exclusive=true
> > mail_list: s.bridi@login0
> > notify: FALSE
> > job_name: sleeper.sh
> > jobshare: 0
> > hard_queue_list: E5m
> > shell_list: NONE:/bin/bash
> > env_list:
> > script_file: sleeper.sh
> > scheduling info: [snip]
> >
> > while the
> > 'qstat -q E5m' don't list the job!
>
> Usually this means that the job is not allowed to run in this queue.
>
> What does:
>
> $ qalter -w v 876735
>
> ouput?
>
> -- Reuti
>
>
> > Thanks
> > Stefano
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users