Hi,
Am 29.10.2013 um 01:21 schrieb Sangmin Park:
> I tried to limit in RQS as you said. But, not working.
>
> RQS related with matlabhosts is only slotcap as below
>
> {
> name slotcap
> description maximum usable slots per node
> enabled TRUE
> limit hosts {*} to slots=$num_proc
> limit queues !matlab.q hosts {*} to slots=$num_proc
> }
>
> Above RQS does not work as subordinate queue.
> When I try to limit as below,
>
> limit queues {*} hosts {*} to slots=$num_proc
Only this line in a one and only RQS? But it would allow each queue to have
$num_proc slots filled on each any every host - what you don't want. As I
posted: one RQS with only one line should do it:
limit queues !matlab.q hosts {*} to slots=$num_proc
> matlab job is submitted to hosts which is used by normal.q job.
> But, normal.q job did not suspend.
I.e. the output of `qstat` doesn't show state "s"? Although they are suspended,
they are still using the resources like memory (they only got a SIGSTOP signal).
> i.e. on same host, normal.q job and matlab.q job are running together.
> It causes performance decline by sharing computing resources.
>
> To guarantee HPC performance, I make slotcap limit,
> it means that no one can use more than the number of cores hosts have.
> Our hosts have 12 cores per node. That's why I made this limit.
The question was more targeting the understanding of RQSes, as it's worth to
note that all enabled RQS will be honored, and that the first rule which
matches the job's scheduling inside an RQS will allow or deny this particular
job according to this rule and no further lines inside this RQS will be scanned.
-- Reuti
> --Sangmin
>
>
>
> On Mon, Oct 28, 2013 at 10:29 PM, Reuti <[email protected]> wrote:
> Am 28.10.2013 um 14:05 schrieb Reuti:
>
> > Am 28.10.2013 um 13:59 schrieb Sangmin Park:
> >
> >> yes, suspending the job when all 12 slots are used on a particular host.
> >> This is what I want to.
> >> So, I tried to submit job using 12 slots, but it did not work.
> >
> > Aha, it might be necessary to change the order of rules in your RQS. The
> > first matching one will allow or deny the job to be started. I.e. if all
> > slots are used the (current) first rules matches and the job is rejected.
>
> In fact, you can try just this:
>
> limit queues !matlab.q hosts {*} to slots=$num_proc
>
> matlab.q needs no limit, as it's already in the queue definition to have an
> upper limit for slots.
>
> -- Reuti
>
>
> > -- Reuti
> >
> >
> >> Still not working..
> >>
> >> --Sangmin
> >>
> >>
> >> On Mon, Oct 28, 2013 at 9:47 PM, Reuti <[email protected]> wrote:
> >> Am 28.10.2013 um 13:45 schrieb Sangmin Park:
> >>
> >>> This is the RQS
> >>>
> >>> limit hosts {@parallelhosts} to slots=$num_proc
> >>> limit queues !matlab.q hosts {@matlabhosts} to slots=$num_proc
> >>> parallelhosts include matlabhosts.
> >>>
> >>> slots value in the matlab.q means the number of cores per node.
> >>>
> >>> All hosts is included in parallelhosts, node1 ~ node30.
> >>> matlabhosts include node1 ~ node7.
> >>> short.q, normal.q and long.q could be used in node1 ~ node7.
> >>>
> >>> I want to set up when jobs with short.q, normal.q and long.q are running,
> >>> if matlab job is submitted,
> >>> running job not using matlab.q in node1 ~ node7 is suspended and matlab
> >>> job is run.
> >>> This is what I want to set up.
> >>>
> >>> I don't understand why it can not be happened if I setup slots value 12.
> >>
> >> It will suspend the job when all 12 slots are used on a particular host.
> >> You may want to try with 1 instead. As s refinement, you could also look
> >> into slotwise subordination.
> >>
> >> -- Reuti
> >>
> >>
> >>> --Sangmin
> >>>
> >>>
> >>> On Mon, Oct 28, 2013 at 8:58 PM, Reuti <[email protected]> wrote:
> >>> Am 28.10.2013 um 12:30 schrieb Sangmin Park:
> >>>
> >>>> I've edit the negative value in the priority section, short.q is 4,
> >>>> normal.q is 6 and long.q is 8, respectively.
> >>>> And I configured 72 cores for each queues.
> >>>
> >>> But you didn't answer the question: How do you limit the overall slot
> >>> count? RQS oder definition in the exechost?
> >>>
> >>>> Below is matlab.q instance details.
> >>>> qname matlab.q
> >>>> hostlist @matlabhosts
> >>>> seq_no 0
> >>>> load_thresholds np_load_avg=1.75
> >>>> suspend_thresholds NONE
> >>>> nsuspend 1
> >>>> suspend_interval 00:05:00
> >>>> priority 2
> >>>> min_cpu_interval 00:05:00
> >>>> processors UNDEFINED
> >>>> qtype BATCH INTERACTIVE
> >>>> ckpt_list NONE
> >>>> pe_list fill_up make matlab
> >>>> rerun FALSE
> >>>> slots 12
> >>>> tmpdir /tmp
> >>>> shell /bin/bash
> >>>> prolog NONE
> >>>> epilog NONE
> >>>> shell_start_mode posix_compliant
> >>>> starter_method NONE
> >>>> suspend_method NONE
> >>>> resume_method NONE
> >>>> terminate_method NONE
> >>>> notify 00:00:60
> >>>> owner_list NONE
> >>>> user_lists octausers onsiteusers
> >>>> xuser_lists NONE
> >>>> subordinate_list short.q=72, normal.q=72, long.q=72
> >>>
> >>> This will suspend these tree queues when 72 slots per queue instance in
> >>> matlab.q is used. As you have only 12 defined above, this will never
> >>> happen.
> >>>
> >>> What behavior would you like to set up?
> >>>
> >>> -- Reuti
> >>>
> >>>
> >>>> complex_values NONE
> >>>> projects NONE
> >>>> xprojects NONE
> >>>> calendar NONE
> >>>> initial_state default
> >>>> s_rt INFINITY
> >>>> h_rt 168:00:00
> >>>> s_cpu INFINITY
> >>>> h_cpu INFINITY
> >>>> s_fsize INFINITY
> >>>> h_fsize INFINITY
> >>>> s_data INFINITY
> >>>> h_data INFINITY
> >>>> s_stack INFINITY
> >>>> h_stack INFINITY
> >>>> s_core INFINITY
> >>>> h_core INFINITY
> >>>> s_rss INFINITY
> >>>> h_rss INFINITY
> >>>> s_vmem INFINITY
> >>>> h_vmem INFINITY
> >>>>
> >>>> thanks,
> >>>>
> >>>> --Sangmin
> >>>>
> >>>>
> >>>> On Mon, Oct 28, 2013 at 3:51 PM, Reuti <[email protected]>
> >>>> wrote:
> >>>> Hi,
> >>>>
> >>>> Am 28.10.2013 um 06:40 schrieb Sangmin Park:
> >>>>
> >>>>> Thanks, adam
> >>>>>
> >>>>> I configured sge queue configuration following second link you said.
> >>>>> But, it does not work.
> >>>>>
> >>>>> I make 4 queues, short.q, normal.q, long.q and matlab.q
> >>>>> short.q, normal.q and long.q queue instances are running all computing
> >>>>> nodes, node1 ~ node30.
> >>>>> matlab.q instance is configured only for a few nodes, node1 ~ node7,
> >>>>> called matlabhosts
> >>>>>
> >>>>> The priorities of each queue is below.
> >>>>> [short.q]
> >>>>> priority -5
> >>>>
> >>>> Don't use negative values here. This number is the "nice value" under
> >>>> which the Linux kernel will run the process (i.e. the scheduler in the
> >>>> kernel, for SGE it doesn't influence the scheduling). User processes
> >>>> should be in the range 0..19 [20 on Solaris]. The negative ones are
> >>>> reserved for kernel processes.
> >>>>
> >>>>
> >>>>> subordinate_list NONE
> >>>>> [normal.q]
> >>>>> priority 0
> >>>>> subordinate_list NONE
> >>>>> [long.q]
> >>>>> priority 5
> >>>>> subordinate_list NONE
> >>>>>
> >>>>> and matlab.q is
> >>>>> priority -10
> >>>>> subordinate_list short.q normal.q long.q
> >>>>
> >>>> Same here. It's also worth to note, that these values are relative. I.e.
> >>>> having the same number of user processes and cores, it doesn't matter
> >>>> which values are used as nice values, as each process gets it's own core
> >>>> anyway. Only when there are more processes than cores it will have an
> >>>> effect. But as these are relative values, it's the same whether
> >>>> (cores+1) processes have all 0 or 19 as nice value.
> >>>>
> >>>>
> >>>>> I submited several jobs using normal.q to the matlabhosts
> >>>>> and I submited a job using matlab.q that has subordinate_list
> >>>>> I expected one of normal.q queue job is suspended and matlab.q queue
> >>>>> job is running.
> >>>>> But, matlab.q queue job waits in queue with status qw. not submitted.
> >>>>>
> >>>>> what's the matter with this?
> >>>>> please help!!
> >>>>
> >>>> http://gridengine.org/pipermail/users/2013-October/006820.html
> >>>>
> >>>> How do you limit the overall slot count?
> >>>>
> >>>> -- Reuti
> >>>>
> >>>>
> >>>>> Sangmin
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Oct 15, 2013 at 3:50 PM, Adam Brenner <[email protected]> wrote:
> >>>>> Sangmin,
> >>>>>
> >>>>> I believe the phrase / term you are looking for is Subordinate
> >>>>> Queues[1][2]. This should handle what you are looking for.
> >>>>>
> >>>>> If not ... I am sure Reuti (or someone else) will correct me on this.
> >>>>>
> >>>>> Enjoy,
> >>>>> -Adam
> >>>>>
> >>>>> [1]: http://docs.oracle.com/cd/E19957-01/820-0698/i998889/index.html
> >>>>> [2]:
> >>>>> http://grid-gurus.blogspot.com/2011/03/using-grid-engine-subordinate-queues.html
> >>>>>
> >>>>> --
> >>>>> Adam Brenner
> >>>>> Computer Science, Undergraduate Student
> >>>>> Donald Bren School of Information and Computer Sciences
> >>>>>
> >>>>> Research Computing Support
> >>>>> Office of Information Technology
> >>>>> http://www.oit.uci.edu/rcs/
> >>>>>
> >>>>> University of California, Irvine
> >>>>> www.ics.uci.edu/~aebrenne/
> >>>>> [email protected]
> >>>>>
> >>>>>
> >>>>> On Mon, Oct 14, 2013 at 11:18 PM, Sangmin Park <[email protected]>
> >>>>> wrote:
> >>>>>> Howdy,
> >>>>>>
> >>>>>> For specific purpose in my organization,
> >>>>>> I want to configure something to SGE scheduler.
> >>>>>>
> >>>>>> Imazine.
> >>>>>> a job is running, called A-job.
> >>>>>> If B-job is submitted during A-job is running,
> >>>>>> I want to hold A-job and run B-job first.
> >>>>>> And after B-job is finished, restart A-job.
> >>>>>>
> >>>>>> What do I do for this?
> >>>>>>
> >>>>>> Sangmin
> >>>>>>
> >>>>>> --
> >>>>>> ===========================
> >>>>>> Sangmin Park
> >>>>>> Supercomputing Center
> >>>>>> Ulsan National Institute of Science and Technology(UNIST)
> >>>>>> Ulsan, 689-798, Korea
> >>>>>>
> >>>>>> phone : +82-52-217-4201
> >>>>>> mobile : +82-10-5094-0405
> >>>>>> fax : +82-52-217-4209
> >>>>>> ===========================
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> [email protected]
> >>>>>> https://gridengine.org/mailman/listinfo/users
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ===========================
> >>>>> Sangmin Park
> >>>>> Supercomputing Center
> >>>>> Ulsan National Institute of Science and Technology(UNIST)
> >>>>> Ulsan, 689-798, Korea
> >>>>>
> >>>>> phone : +82-52-217-4201
> >>>>> mobile : +82-10-5094-0405
> >>>>> fax : +82-52-217-4209
> >>>>> ===========================
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> [email protected]
> >>>>> https://gridengine.org/mailman/listinfo/users
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> ===========================
> >>>> Sangmin Park
> >>>> Supercomputing Center
> >>>> Ulsan National Institute of Science and Technology(UNIST)
> >>>> Ulsan, 689-798, Korea
> >>>>
> >>>> phone : +82-52-217-4201
> >>>> mobile : +82-10-5094-0405
> >>>> fax : +82-52-217-4209
> >>>> ===========================
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> ===========================
> >>> Sangmin Park
> >>> Supercomputing Center
> >>> Ulsan National Institute of Science and Technology(UNIST)
> >>> Ulsan, 689-798, Korea
> >>>
> >>> phone : +82-52-217-4201
> >>> mobile : +82-10-5094-0405
> >>> fax : +82-52-217-4209
> >>> ===========================
> >>
> >>
> >>
> >>
> >> --
> >> ===========================
> >> Sangmin Park
> >> Supercomputing Center
> >> Ulsan National Institute of Science and Technology(UNIST)
> >> Ulsan, 689-798, Korea
> >>
> >> phone : +82-52-217-4201
> >> mobile : +82-10-5094-0405
> >> fax : +82-52-217-4209
> >> ===========================
> >
> >
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
>
>
>
> --
> ===========================
> Sangmin Park
> Supercomputing Center
> Ulsan National Institute of Science and Technology(UNIST)
> Ulsan, 689-798, Korea
>
> phone : +82-52-217-4201
> mobile : +82-10-5094-0405
> fax : +82-52-217-4209
> ===========================
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users