Hi,

Am 29.10.2013 um 01:21 schrieb Sangmin Park:

> I tried to limit in RQS as you said. But, not working.
> 
> RQS related with matlabhosts is only slotcap as below
> 
> {
>    name         slotcap
>    description  maximum usable slots per node
>    enabled      TRUE
>    limit        hosts {*} to slots=$num_proc
>    limit        queues !matlab.q hosts {*} to slots=$num_proc
> }
> 
> Above RQS does not work as subordinate queue.
> When I try to limit as below,
> 
>    limit         queues {*} hosts {*} to slots=$num_proc

Only this line in a one and only RQS? But it would allow each queue to have 
$num_proc slots filled on each any every host - what you don't want. As I 
posted: one RQS with only one line should do it:

limit        queues !matlab.q hosts {*} to slots=$num_proc



> matlab job is submitted to hosts which is used by normal.q job.
> But, normal.q job did not suspend.

I.e. the output of `qstat` doesn't show state "s"? Although they are suspended, 
they are still using the resources like memory (they only got a SIGSTOP signal).


> i.e. on same host, normal.q job and matlab.q job are running together. 
> It causes performance decline by sharing computing resources.
> 
> To guarantee HPC performance, I make slotcap limit,
> it means that no one can use more than the number of cores hosts have.
> Our hosts have 12 cores per node. That's why I made this limit.

The question was more targeting the understanding of RQSes, as it's worth to 
note that all enabled RQS will be honored, and that the first rule which 
matches the job's scheduling inside an RQS will allow or deny this particular 
job according to this rule and no further lines inside this RQS will be scanned.

-- Reuti


> --Sangmin
> 
> 
> 
> On Mon, Oct 28, 2013 at 10:29 PM, Reuti <[email protected]> wrote:
> Am 28.10.2013 um 14:05 schrieb Reuti:
> 
> > Am 28.10.2013 um 13:59 schrieb Sangmin Park:
> >
> >> yes, suspending the job when all 12 slots are used on a particular host. 
> >> This is what I want to.
> >> So, I tried to submit job using 12 slots, but it did not work.
> >
> > Aha, it might be necessary to change the order of rules in your RQS. The 
> > first matching one will allow or deny the job to be started. I.e. if all 
> > slots are used the (current) first rules matches and the job is rejected.
> 
> In fact, you can try just this:
> 
> limit        queues !matlab.q hosts {*} to slots=$num_proc
> 
> matlab.q needs no limit, as it's already in the queue definition to have an 
> upper limit for slots.
> 
> -- Reuti
> 
> 
> > -- Reuti
> >
> >
> >> Still not working..
> >>
> >> --Sangmin
> >>
> >>
> >> On Mon, Oct 28, 2013 at 9:47 PM, Reuti <[email protected]> wrote:
> >> Am 28.10.2013 um 13:45 schrieb Sangmin Park:
> >>
> >>> This is the RQS
> >>>
> >>>   limit        hosts {@parallelhosts} to slots=$num_proc
> >>>   limit        queues !matlab.q hosts {@matlabhosts} to slots=$num_proc
> >>> parallelhosts include matlabhosts.
> >>>
> >>> slots value in the matlab.q means the number of cores per node.
> >>>
> >>> All hosts is included in parallelhosts, node1 ~ node30.
> >>> matlabhosts include node1 ~ node7.
> >>> short.q, normal.q and long.q could be used in node1 ~ node7.
> >>>
> >>> I want to set up when jobs with short.q, normal.q and long.q are running, 
> >>> if matlab job is submitted,
> >>> running job not using matlab.q in node1 ~ node7 is suspended and matlab 
> >>> job is run.
> >>> This is what I want to set up.
> >>>
> >>> I don't understand why it can not be happened if I setup slots value 12.
> >>
> >> It will suspend the job when all 12 slots are used on a particular host. 
> >> You may want to try with 1 instead. As s refinement, you could also look 
> >> into slotwise subordination.
> >>
> >> -- Reuti
> >>
> >>
> >>> --Sangmin
> >>>
> >>>
> >>> On Mon, Oct 28, 2013 at 8:58 PM, Reuti <[email protected]> wrote:
> >>> Am 28.10.2013 um 12:30 schrieb Sangmin Park:
> >>>
> >>>> I've edit the negative value in the priority section, short.q is 4, 
> >>>> normal.q is 6 and long.q is 8, respectively.
> >>>> And I configured 72 cores for each queues.
> >>>
> >>> But you didn't answer the question: How do you limit the overall slot 
> >>> count? RQS oder definition in the exechost?
> >>>
> >>>> Below is matlab.q instance details.
> >>>> qname                 matlab.q
> >>>> hostlist              @matlabhosts
> >>>> seq_no                0
> >>>> load_thresholds       np_load_avg=1.75
> >>>> suspend_thresholds    NONE
> >>>> nsuspend              1
> >>>> suspend_interval      00:05:00
> >>>> priority              2
> >>>> min_cpu_interval      00:05:00
> >>>> processors            UNDEFINED
> >>>> qtype                 BATCH INTERACTIVE
> >>>> ckpt_list             NONE
> >>>> pe_list               fill_up make matlab
> >>>> rerun                 FALSE
> >>>> slots                 12
> >>>> tmpdir                /tmp
> >>>> shell                 /bin/bash
> >>>> prolog                NONE
> >>>> epilog                NONE
> >>>> shell_start_mode      posix_compliant
> >>>> starter_method        NONE
> >>>> suspend_method        NONE
> >>>> resume_method         NONE
> >>>> terminate_method      NONE
> >>>> notify                00:00:60
> >>>> owner_list            NONE
> >>>> user_lists            octausers onsiteusers
> >>>> xuser_lists           NONE
> >>>> subordinate_list      short.q=72, normal.q=72, long.q=72
> >>>
> >>> This will suspend these tree queues when 72 slots per queue instance in 
> >>> matlab.q is used. As you have only 12 defined above, this will never 
> >>> happen.
> >>>
> >>> What behavior would you like to set up?
> >>>
> >>> -- Reuti
> >>>
> >>>
> >>>> complex_values        NONE
> >>>> projects              NONE
> >>>> xprojects             NONE
> >>>> calendar              NONE
> >>>> initial_state         default
> >>>> s_rt                  INFINITY
> >>>> h_rt                  168:00:00
> >>>> s_cpu                 INFINITY
> >>>> h_cpu                 INFINITY
> >>>> s_fsize               INFINITY
> >>>> h_fsize               INFINITY
> >>>> s_data                INFINITY
> >>>> h_data                INFINITY
> >>>> s_stack               INFINITY
> >>>> h_stack               INFINITY
> >>>> s_core                INFINITY
> >>>> h_core                INFINITY
> >>>> s_rss                 INFINITY
> >>>> h_rss                 INFINITY
> >>>> s_vmem                INFINITY
> >>>> h_vmem                INFINITY
> >>>>
> >>>> thanks,
> >>>>
> >>>> --Sangmin
> >>>>
> >>>>
> >>>> On Mon, Oct 28, 2013 at 3:51 PM, Reuti <[email protected]> 
> >>>> wrote:
> >>>> Hi,
> >>>>
> >>>> Am 28.10.2013 um 06:40 schrieb Sangmin Park:
> >>>>
> >>>>> Thanks, adam
> >>>>>
> >>>>> I configured sge queue configuration following second link you said.
> >>>>> But, it does not work.
> >>>>>
> >>>>> I make 4 queues, short.q, normal.q, long.q and matlab.q
> >>>>> short.q, normal.q and long.q queue instances are running all computing 
> >>>>> nodes, node1 ~ node30.
> >>>>> matlab.q instance is configured only for a few nodes, node1 ~ node7, 
> >>>>> called matlabhosts
> >>>>>
> >>>>> The priorities of each queue is below.
> >>>>> [short.q]
> >>>>> priority              -5
> >>>>
> >>>> Don't use negative values here. This number is the "nice value" under 
> >>>> which the Linux kernel will run the process (i.e. the scheduler in the 
> >>>> kernel, for SGE it doesn't influence the scheduling). User processes 
> >>>> should be in the range 0..19 [20 on Solaris]. The negative ones are 
> >>>> reserved for kernel processes.
> >>>>
> >>>>
> >>>>> subordinate_list      NONE
> >>>>> [normal.q]
> >>>>> priority              0
> >>>>> subordinate_list      NONE
> >>>>> [long.q]
> >>>>> priority              5
> >>>>> subordinate_list      NONE
> >>>>>
> >>>>> and matlab.q is
> >>>>> priority              -10
> >>>>> subordinate_list      short.q normal.q long.q
> >>>>
> >>>> Same here. It's also worth to note, that these values are relative. I.e. 
> >>>> having the same number of user processes and cores, it doesn't matter 
> >>>> which values are used as nice values, as each process gets it's own core 
> >>>> anyway. Only when there are more processes than cores it will have an 
> >>>> effect. But as these are relative values, it's the same whether 
> >>>> (cores+1) processes have all 0 or 19 as nice value.
> >>>>
> >>>>
> >>>>> I submited several jobs using normal.q to the matlabhosts
> >>>>> and I submited a job using matlab.q that has subordinate_list
> >>>>> I expected one of normal.q queue job is suspended and matlab.q queue 
> >>>>> job is running.
> >>>>> But, matlab.q queue job waits in queue with status qw. not submitted.
> >>>>>
> >>>>> what's the matter with this?
> >>>>> please help!!
> >>>>
> >>>> http://gridengine.org/pipermail/users/2013-October/006820.html
> >>>>
> >>>> How do you limit the overall slot count?
> >>>>
> >>>> -- Reuti
> >>>>
> >>>>
> >>>>> Sangmin
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Oct 15, 2013 at 3:50 PM, Adam Brenner <[email protected]> wrote:
> >>>>> Sangmin,
> >>>>>
> >>>>> I believe the phrase / term you are looking for is Subordinate
> >>>>> Queues[1][2]. This should handle what you are looking for.
> >>>>>
> >>>>> If not ... I am sure Reuti (or someone else) will correct me on this.
> >>>>>
> >>>>> Enjoy,
> >>>>> -Adam
> >>>>>
> >>>>> [1]: http://docs.oracle.com/cd/E19957-01/820-0698/i998889/index.html
> >>>>> [2]: 
> >>>>> http://grid-gurus.blogspot.com/2011/03/using-grid-engine-subordinate-queues.html
> >>>>>
> >>>>> --
> >>>>> Adam Brenner
> >>>>> Computer Science, Undergraduate Student
> >>>>> Donald Bren School of Information and Computer Sciences
> >>>>>
> >>>>> Research Computing Support
> >>>>> Office of Information Technology
> >>>>> http://www.oit.uci.edu/rcs/
> >>>>>
> >>>>> University of California, Irvine
> >>>>> www.ics.uci.edu/~aebrenne/
> >>>>> [email protected]
> >>>>>
> >>>>>
> >>>>> On Mon, Oct 14, 2013 at 11:18 PM, Sangmin Park <[email protected]> 
> >>>>> wrote:
> >>>>>> Howdy,
> >>>>>>
> >>>>>> For specific purpose in my organization,
> >>>>>> I want to configure something to SGE scheduler.
> >>>>>>
> >>>>>> Imazine.
> >>>>>> a job is running, called A-job.
> >>>>>> If B-job is submitted during A-job is running,
> >>>>>> I want to hold A-job and run B-job first.
> >>>>>> And after B-job is finished, restart A-job.
> >>>>>>
> >>>>>> What do I do for this?
> >>>>>>
> >>>>>> Sangmin
> >>>>>>
> >>>>>> --
> >>>>>> ===========================
> >>>>>> Sangmin Park
> >>>>>> Supercomputing Center
> >>>>>> Ulsan National Institute of Science and Technology(UNIST)
> >>>>>> Ulsan, 689-798, Korea
> >>>>>>
> >>>>>> phone : +82-52-217-4201
> >>>>>> mobile : +82-10-5094-0405
> >>>>>> fax : +82-52-217-4209
> >>>>>> ===========================
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> [email protected]
> >>>>>> https://gridengine.org/mailman/listinfo/users
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ===========================
> >>>>> Sangmin Park
> >>>>> Supercomputing Center
> >>>>> Ulsan National Institute of Science and Technology(UNIST)
> >>>>> Ulsan, 689-798, Korea
> >>>>>
> >>>>> phone : +82-52-217-4201
> >>>>> mobile : +82-10-5094-0405
> >>>>> fax : +82-52-217-4209
> >>>>> ===========================
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> [email protected]
> >>>>> https://gridengine.org/mailman/listinfo/users
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> ===========================
> >>>> Sangmin Park
> >>>> Supercomputing Center
> >>>> Ulsan National Institute of Science and Technology(UNIST)
> >>>> Ulsan, 689-798, Korea
> >>>>
> >>>> phone : +82-52-217-4201
> >>>> mobile : +82-10-5094-0405
> >>>> fax : +82-52-217-4209
> >>>> ===========================
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> ===========================
> >>> Sangmin Park
> >>> Supercomputing Center
> >>> Ulsan National Institute of Science and Technology(UNIST)
> >>> Ulsan, 689-798, Korea
> >>>
> >>> phone : +82-52-217-4201
> >>> mobile : +82-10-5094-0405
> >>> fax : +82-52-217-4209
> >>> ===========================
> >>
> >>
> >>
> >>
> >> --
> >> ===========================
> >> Sangmin Park
> >> Supercomputing Center
> >> Ulsan National Institute of Science and Technology(UNIST)
> >> Ulsan, 689-798, Korea
> >>
> >> phone : +82-52-217-4201
> >> mobile : +82-10-5094-0405
> >> fax : +82-52-217-4209
> >> ===========================
> >
> >
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
> 
> 
> 
> 
> -- 
> ===========================
> Sangmin Park 
> Supercomputing Center
> Ulsan National Institute of Science and Technology(UNIST)
> Ulsan, 689-798, Korea 
> 
> phone : +82-52-217-4201
> mobile : +82-10-5094-0405
> fax : +82-52-217-4209
> ===========================


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to