Re: [gridengine users] Help with PE tshooting / config

Reuti Fri, 11 Apr 2014 08:44:27 -0700

Am 11.04.2014 um 17:28 schrieb Michael Coffman:

> On Fri, Apr 11, 2014 at 8:26 AM, Reuti <[email protected]> wrote:
> Am 11.04.2014 um 16:14 schrieb Michael Coffman:
> 
> > Thanks for the respons!
> >
> > $ qstat -f
> > queuename                      qtype resv/used/tot. load_avg arch          
> > states
> > ---------------------------------------------------------------------------------
> > fast.q@gridtst1 BIP   0/0/8          0.00     lx24-amd64
> > ---------------------------------------------------------------------------------
> > fast.q@gridtst2 BIP   0/0/8          0.00     lx24-amd64
> > ---------------------------------------------------------------------------------
> > fast.q@gridtst3 BIP   0/0/8          0.00     lx24-amd64
> > ---------------------------------------------------------------------------------
> > all.q@gridtst1 BIP   0/0/10         0.00     lx24-amd64
> > ---------------------------------------------------------------------------------
> > all.q@gridtst2 BIP   0/0/10         0.00     lx24-amd64
> > ---------------------------------------------------------------------------------
> > all.q@gridtst3 BIP   0/0/10         0.00     lx24-amd64
> >
> >
> > I have some custom complexes, but set them to 0 for the defaults for 
> > testing.   This command
> > seemed like the best way to see what was being submitted with default 
> > values..
> >
> > $ qconf -sc | awk '$7~/1/ {print}'
>   
> qconf -sc | awk '$7!~/(0|NONE)/ {print}''
> 
> 128G                  128G              BOOL        ==    FORCED      NO      
>    FALSE    0
> 64G                   64G               BOOL        ==    FORCED      NO      
>    FALSE    0
> QUALIFICATION         qualify           BOOL        ==    FORCED      NO      
>    FALSE    0
> critical              troll             BOOL        ==    YES         NO      
>    FALSE    1000000
> dedicate              d                 BOOL        ==    FORCED      NO      
>    FALSE    0
> hspice                spc               INT         <=    YES         YES     
>    FALSE    5000


The above one looks strange, but it seems working.


> redhawk               red               BOOL        ==    FORCED      NO      
>    FALSE    0
> short                 short             BOOL        ==    FORCED      NO      
>    FALSE    100000
> slots                 s                 INT         <=    YES         YES     
>    1        1000
> top                   top_resvd         BOOL        ==    FORCED      NO      
>    FALSE    0
> unthrottled           djw               BOOL        ==    YES         NO      
>    FALSE    0
> 
> 
> 
> So, there is nothing runninfg right now and no reservation made. Can you 
> please post the queue configuration. Any limit of slots on RQS or on a global 
> level (`qconf -se global`)?
> 
> Nothing running and nothing queued besides the pe job I submitted..
> 
> $ qstat -u '*'
> job-ID  prior   name       user         state submit/start at     queue       
>                    slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
>    9724 0.65000 QRLOGIN    coffman      qw    04/11/2014 09:22:07             
>                       24

The queue configuration?


> $ qconf -srqsl
> no resource quota set list defined

Good.


> > slots                 s                 INT         <=    YES         YES   
> >      1        1000
> >
> >
> >
> > On Thu, Apr 10, 2014 at 4:08 PM, Reuti <[email protected]> wrote:
> > Am 10.04.2014 um 23:51 schrieb Michael Coffman:
> >
> > > I am trying to setup a PE and am struggling to understand how grid 
> > > determines how many slots are available for the PE.   I have set up 3 
> > > test machines in a queue.  I set the default slots to 10.  Each system is 
> > > actually a virtual machine that has one cpu and ~2G of memory.    PE 
> > > definition is:
> > >
> > > pe_name            dp
> > > slots              999
> > > user_lists         NONE
> > > xuser_lists        NONE
> > > start_proc_args    /bin/true
> > > stop_proc_args     /bin/true
> > > allocation_rule    $fill_up
> > > control_slaves     FALSE
> > > job_is_first_task  TRUE
> > > urgency_slots      min
> > > accounting_summary FALSE
> > >
> > > Since I have 10 slots per host, I assumed I would have 30 slots.   And 
> > > when testing I get:
> > >
> > > $qrsh -w v -q all.q  -now no -pe dp 30
> > > verification: found possible assignment with 30 slots
> > >
> > > $qrsh -w p -q all.q  -now no -pe dp 30
> > > verification: found possible assignment with 30 slots
> > >
> > > But when I actually try to run the job the following from qstat...
> > >
> > > cannot run in PE "dp" because it only offers 12 slots
> > >
> > > I get that other resources can impact the availablity of slots, but I'm 
> > > having a hard time figuring out why I'm only getting 12 slots and what 
> > > resources are impacting this...
> > >
> > > When I request -pd dp 12, it works fine and distributes the jobs across 
> > > all three systems...
> > >
> > > 717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 all.q@gridtst1 
> > > SLAVE
> > >                                                            all.q@gridtst1 
> > > SLAVE
> > >                                                            all.q@gridtst1 
> > > SLAVE
> > >                                                            all.q@gridtst1 
> > > SLAVE
> > > 9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 all.q@gridtst2 
> > > SLAVE
> > >                                                            all.q@gridtst2 
> > > SLAVE
> > >                                                            all.q@gridtst2 
> > > SLAVE
> > >                                                            all.q@gridtst2 
> > > SLAVE
> > > 9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 all.q@gridtst3 
> > > MASTER
> > >                                                            all.q@gridtst3 
> > > SLAVE
> > >                                                            all.q@gridtst3 
> > > SLAVE
> > >                                                            all.q@gridtst3 
> > > SLAVE
> >
> > What's the output of: qstat -f
> >
> > Did you setup any consumable like memory on the nodes with a default 
> > consumption?
> >
> > - Reuti
> >
> >
> > > I'm assuming I am missing something simple :(    What should I be looking 
> > > at to help me better understand what's going on?    I do notice that 
> > > hl:cpu jumps significantly between idle, dp 12 and dp 24, but I did find 
> > > anything in the docs describing what cpu represents...

/usr/sge/doc/load_parameters.asc

It's % load.

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Help with PE tshooting / config

Reply via email to