I tried to limit in RQS as you said. But, not working.
RQS related with matlabhosts is only slotcap as below
{
name slotcap
description maximum usable slots per node
enabled TRUE
limit hosts {*} to slots=$num_proc
limit queues !matlab.q hosts {*} to slots=$num_proc
}
Above RQS does not work as subordinate queue.
When I try to limit as below,
limit queues {*} hosts {*} to slots=$num_proc
matlab job is submitted to hosts which is used by normal.q job.
But, normal.q job did not suspend.
i.e. on same host, normal.q job and matlab.q job are running together.
It causes performance decline by sharing computing resources.
To guarantee HPC performance, I make slotcap limit,
it means that no one can use more than the number of cores hosts have.
Our hosts have 12 cores per node. That's why I made this limit.
--Sangmin
On Mon, Oct 28, 2013 at 10:29 PM, Reuti <[email protected]> wrote:
> Am 28.10.2013 um 14:05 schrieb Reuti:
>
> > Am 28.10.2013 um 13:59 schrieb Sangmin Park:
> >
> >> yes, suspending the job when all 12 slots are used on a particular
> host. This is what I want to.
> >> So, I tried to submit job using 12 slots, but it did not work.
> >
> > Aha, it might be necessary to change the order of rules in your RQS. The
> first matching one will allow or deny the job to be started. I.e. if all
> slots are used the (current) first rules matches and the job is rejected.
>
> In fact, you can try just this:
>
> limit queues !matlab.q hosts {*} to slots=$num_proc
>
> matlab.q needs no limit, as it's already in the queue definition to have
> an upper limit for slots.
>
> -- Reuti
>
>
> > -- Reuti
> >
> >
> >> Still not working..
> >>
> >> --Sangmin
> >>
> >>
> >> On Mon, Oct 28, 2013 at 9:47 PM, Reuti <[email protected]>
> wrote:
> >> Am 28.10.2013 um 13:45 schrieb Sangmin Park:
> >>
> >>> This is the RQS
> >>>
> >>> limit hosts {@parallelhosts} to slots=$num_proc
> >>> limit queues !matlab.q hosts {@matlabhosts} to slots=$num_proc
> >>> parallelhosts include matlabhosts.
> >>>
> >>> slots value in the matlab.q means the number of cores per node.
> >>>
> >>> All hosts is included in parallelhosts, node1 ~ node30.
> >>> matlabhosts include node1 ~ node7.
> >>> short.q, normal.q and long.q could be used in node1 ~ node7.
> >>>
> >>> I want to set up when jobs with short.q, normal.q and long.q are
> running, if matlab job is submitted,
> >>> running job not using matlab.q in node1 ~ node7 is suspended and
> matlab job is run.
> >>> This is what I want to set up.
> >>>
> >>> I don't understand why it can not be happened if I setup slots value
> 12.
> >>
> >> It will suspend the job when all 12 slots are used on a particular
> host. You may want to try with 1 instead. As s refinement, you could also
> look into slotwise subordination.
> >>
> >> -- Reuti
> >>
> >>
> >>> --Sangmin
> >>>
> >>>
> >>> On Mon, Oct 28, 2013 at 8:58 PM, Reuti <[email protected]>
> wrote:
> >>> Am 28.10.2013 um 12:30 schrieb Sangmin Park:
> >>>
> >>>> I've edit the negative value in the priority section, short.q is 4,
> normal.q is 6 and long.q is 8, respectively.
> >>>> And I configured 72 cores for each queues.
> >>>
> >>> But you didn't answer the question: How do you limit the overall slot
> count? RQS oder definition in the exechost?
> >>>
> >>>> Below is matlab.q instance details.
> >>>> qname matlab.q
> >>>> hostlist @matlabhosts
> >>>> seq_no 0
> >>>> load_thresholds np_load_avg=1.75
> >>>> suspend_thresholds NONE
> >>>> nsuspend 1
> >>>> suspend_interval 00:05:00
> >>>> priority 2
> >>>> min_cpu_interval 00:05:00
> >>>> processors UNDEFINED
> >>>> qtype BATCH INTERACTIVE
> >>>> ckpt_list NONE
> >>>> pe_list fill_up make matlab
> >>>> rerun FALSE
> >>>> slots 12
> >>>> tmpdir /tmp
> >>>> shell /bin/bash
> >>>> prolog NONE
> >>>> epilog NONE
> >>>> shell_start_mode posix_compliant
> >>>> starter_method NONE
> >>>> suspend_method NONE
> >>>> resume_method NONE
> >>>> terminate_method NONE
> >>>> notify 00:00:60
> >>>> owner_list NONE
> >>>> user_lists octausers onsiteusers
> >>>> xuser_lists NONE
> >>>> subordinate_list short.q=72, normal.q=72, long.q=72
> >>>
> >>> This will suspend these tree queues when 72 slots per queue instance
> in matlab.q is used. As you have only 12 defined above, this will never
> happen.
> >>>
> >>> What behavior would you like to set up?
> >>>
> >>> -- Reuti
> >>>
> >>>
> >>>> complex_values NONE
> >>>> projects NONE
> >>>> xprojects NONE
> >>>> calendar NONE
> >>>> initial_state default
> >>>> s_rt INFINITY
> >>>> h_rt 168:00:00
> >>>> s_cpu INFINITY
> >>>> h_cpu INFINITY
> >>>> s_fsize INFINITY
> >>>> h_fsize INFINITY
> >>>> s_data INFINITY
> >>>> h_data INFINITY
> >>>> s_stack INFINITY
> >>>> h_stack INFINITY
> >>>> s_core INFINITY
> >>>> h_core INFINITY
> >>>> s_rss INFINITY
> >>>> h_rss INFINITY
> >>>> s_vmem INFINITY
> >>>> h_vmem INFINITY
> >>>>
> >>>> thanks,
> >>>>
> >>>> --Sangmin
> >>>>
> >>>>
> >>>> On Mon, Oct 28, 2013 at 3:51 PM, Reuti <[email protected]>
> wrote:
> >>>> Hi,
> >>>>
> >>>> Am 28.10.2013 um 06:40 schrieb Sangmin Park:
> >>>>
> >>>>> Thanks, adam
> >>>>>
> >>>>> I configured sge queue configuration following second link you said.
> >>>>> But, it does not work.
> >>>>>
> >>>>> I make 4 queues, short.q, normal.q, long.q and matlab.q
> >>>>> short.q, normal.q and long.q queue instances are running all
> computing nodes, node1 ~ node30.
> >>>>> matlab.q instance is configured only for a few nodes, node1 ~ node7,
> called matlabhosts
> >>>>>
> >>>>> The priorities of each queue is below.
> >>>>> [short.q]
> >>>>> priority -5
> >>>>
> >>>> Don't use negative values here. This number is the "nice value" under
> which the Linux kernel will run the process (i.e. the scheduler in the
> kernel, for SGE it doesn't influence the scheduling). User processes should
> be in the range 0..19 [20 on Solaris]. The negative ones are reserved for
> kernel processes.
> >>>>
> >>>>
> >>>>> subordinate_list NONE
> >>>>> [normal.q]
> >>>>> priority 0
> >>>>> subordinate_list NONE
> >>>>> [long.q]
> >>>>> priority 5
> >>>>> subordinate_list NONE
> >>>>>
> >>>>> and matlab.q is
> >>>>> priority -10
> >>>>> subordinate_list short.q normal.q long.q
> >>>>
> >>>> Same here. It's also worth to note, that these values are relative.
> I.e. having the same number of user processes and cores, it doesn't matter
> which values are used as nice values, as each process gets it's own core
> anyway. Only when there are more processes than cores it will have an
> effect. But as these are relative values, it's the same whether (cores+1)
> processes have all 0 or 19 as nice value.
> >>>>
> >>>>
> >>>>> I submited several jobs using normal.q to the matlabhosts
> >>>>> and I submited a job using matlab.q that has subordinate_list
> >>>>> I expected one of normal.q queue job is suspended and matlab.q queue
> job is running.
> >>>>> But, matlab.q queue job waits in queue with status qw. not submitted.
> >>>>>
> >>>>> what's the matter with this?
> >>>>> please help!!
> >>>>
> >>>> http://gridengine.org/pipermail/users/2013-October/006820.html
> >>>>
> >>>> How do you limit the overall slot count?
> >>>>
> >>>> -- Reuti
> >>>>
> >>>>
> >>>>> Sangmin
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Oct 15, 2013 at 3:50 PM, Adam Brenner <[email protected]>
> wrote:
> >>>>> Sangmin,
> >>>>>
> >>>>> I believe the phrase / term you are looking for is Subordinate
> >>>>> Queues[1][2]. This should handle what you are looking for.
> >>>>>
> >>>>> If not ... I am sure Reuti (or someone else) will correct me on this.
> >>>>>
> >>>>> Enjoy,
> >>>>> -Adam
> >>>>>
> >>>>> [1]: http://docs.oracle.com/cd/E19957-01/820-0698/i998889/index.html
> >>>>> [2]:
> http://grid-gurus.blogspot.com/2011/03/using-grid-engine-subordinate-queues.html
> >>>>>
> >>>>> --
> >>>>> Adam Brenner
> >>>>> Computer Science, Undergraduate Student
> >>>>> Donald Bren School of Information and Computer Sciences
> >>>>>
> >>>>> Research Computing Support
> >>>>> Office of Information Technology
> >>>>> http://www.oit.uci.edu/rcs/
> >>>>>
> >>>>> University of California, Irvine
> >>>>> www.ics.uci.edu/~aebrenne/
> >>>>> [email protected]
> >>>>>
> >>>>>
> >>>>> On Mon, Oct 14, 2013 at 11:18 PM, Sangmin Park <
> [email protected]> wrote:
> >>>>>> Howdy,
> >>>>>>
> >>>>>> For specific purpose in my organization,
> >>>>>> I want to configure something to SGE scheduler.
> >>>>>>
> >>>>>> Imazine.
> >>>>>> a job is running, called A-job.
> >>>>>> If B-job is submitted during A-job is running,
> >>>>>> I want to hold A-job and run B-job first.
> >>>>>> And after B-job is finished, restart A-job.
> >>>>>>
> >>>>>> What do I do for this?
> >>>>>>
> >>>>>> Sangmin
> >>>>>>
> >>>>>> --
> >>>>>> ===========================
> >>>>>> Sangmin Park
> >>>>>> Supercomputing Center
> >>>>>> Ulsan National Institute of Science and Technology(UNIST)
> >>>>>> Ulsan, 689-798, Korea
> >>>>>>
> >>>>>> phone : +82-52-217-4201
> >>>>>> mobile : +82-10-5094-0405
> >>>>>> fax : +82-52-217-4209
> >>>>>> ===========================
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> [email protected]
> >>>>>> https://gridengine.org/mailman/listinfo/users
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ===========================
> >>>>> Sangmin Park
> >>>>> Supercomputing Center
> >>>>> Ulsan National Institute of Science and Technology(UNIST)
> >>>>> Ulsan, 689-798, Korea
> >>>>>
> >>>>> phone : +82-52-217-4201
> >>>>> mobile : +82-10-5094-0405
> >>>>> fax : +82-52-217-4209
> >>>>> ===========================
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> [email protected]
> >>>>> https://gridengine.org/mailman/listinfo/users
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> ===========================
> >>>> Sangmin Park
> >>>> Supercomputing Center
> >>>> Ulsan National Institute of Science and Technology(UNIST)
> >>>> Ulsan, 689-798, Korea
> >>>>
> >>>> phone : +82-52-217-4201
> >>>> mobile : +82-10-5094-0405
> >>>> fax : +82-52-217-4209
> >>>> ===========================
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> ===========================
> >>> Sangmin Park
> >>> Supercomputing Center
> >>> Ulsan National Institute of Science and Technology(UNIST)
> >>> Ulsan, 689-798, Korea
> >>>
> >>> phone : +82-52-217-4201
> >>> mobile : +82-10-5094-0405
> >>> fax : +82-52-217-4209
> >>> ===========================
> >>
> >>
> >>
> >>
> >> --
> >> ===========================
> >> Sangmin Park
> >> Supercomputing Center
> >> Ulsan National Institute of Science and Technology(UNIST)
> >> Ulsan, 689-798, Korea
> >>
> >> phone : +82-52-217-4201
> >> mobile : +82-10-5094-0405
> >> fax : +82-52-217-4209
> >> ===========================
> >
> >
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
>
--
===========================
Sangmin Park
Supercomputing Center
Ulsan National Institute of Science and Technology(UNIST)
Ulsan, 689-798, Korea
phone : +82-52-217-4201
mobile : +82-10-5094-0405
fax : +82-52-217-4209
===========================
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users