Re: [gridengine users] Help with PE tshooting / config

Reuti Fri, 11 Apr 2014 07:29:33 -0700

Am 11.04.2014 um 16:14 schrieb Michael Coffman:

> Thanks for the respons!
> 
> $ qstat -f
> queuename                      qtype resv/used/tot. load_avg arch          
> states
> ---------------------------------------------------------------------------------
> fast.q@gridtst1 BIP   0/0/8          0.00     lx24-amd64
> ---------------------------------------------------------------------------------
> fast.q@gridtst2 BIP   0/0/8          0.00     lx24-amd64
> ---------------------------------------------------------------------------------
> fast.q@gridtst3 BIP   0/0/8          0.00     lx24-amd64
> ---------------------------------------------------------------------------------
> all.q@gridtst1 BIP   0/0/10         0.00     lx24-amd64
> ---------------------------------------------------------------------------------
> all.q@gridtst2 BIP   0/0/10         0.00     lx24-amd64
> ---------------------------------------------------------------------------------
> all.q@gridtst3 BIP   0/0/10         0.00     lx24-amd64
> 
> 
> I have some custom complexes, but set them to 0 for the defaults for testing. 
>   This command
> seemed like the best way to see what was being submitted with default values..
> 
> $ qconf -sc | awk '$7~/1/ {print}'


qconf -sc | awk '$7!~/(0|NONE)/ {print}''

So, there is nothing runninfg right now and no reservation made. Can you please 
post the queue configuration. Any limit of slots on RQS or on a global level 
(`qconf -se global`)?

-- Reuti


> slots                 s                 INT         <=    YES         YES     
>    1        1000
> 
> 
> 
> On Thu, Apr 10, 2014 at 4:08 PM, Reuti <[email protected]> wrote:
> Am 10.04.2014 um 23:51 schrieb Michael Coffman:
> 
> > I am trying to setup a PE and am struggling to understand how grid 
> > determines how many slots are available for the PE.   I have set up 3 test 
> > machines in a queue.  I set the default slots to 10.  Each system is 
> > actually a virtual machine that has one cpu and ~2G of memory.    PE 
> > definition is:
> >
> > pe_name            dp
> > slots              999
> > user_lists         NONE
> > xuser_lists        NONE
> > start_proc_args    /bin/true
> > stop_proc_args     /bin/true
> > allocation_rule    $fill_up
> > control_slaves     FALSE
> > job_is_first_task  TRUE
> > urgency_slots      min
> > accounting_summary FALSE
> >
> > Since I have 10 slots per host, I assumed I would have 30 slots.   And when 
> > testing I get:
> >
> > $qrsh -w v -q all.q  -now no -pe dp 30
> > verification: found possible assignment with 30 slots
> >
> > $qrsh -w p -q all.q  -now no -pe dp 30
> > verification: found possible assignment with 30 slots
> >
> > But when I actually try to run the job the following from qstat...
> >
> > cannot run in PE "dp" because it only offers 12 slots
> >
> > I get that other resources can impact the availablity of slots, but I'm 
> > having a hard time figuring out why I'm only getting 12 slots and what 
> > resources are impacting this...
> >
> > When I request -pd dp 12, it works fine and distributes the jobs across all 
> > three systems...
> >
> > 717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 all.q@gridtst1 
> > SLAVE
> >                                                            all.q@gridtst1 
> > SLAVE
> >                                                            all.q@gridtst1 
> > SLAVE
> >                                                            all.q@gridtst1 
> > SLAVE
> > 9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 all.q@gridtst2 
> > SLAVE
> >                                                            all.q@gridtst2 
> > SLAVE
> >                                                            all.q@gridtst2 
> > SLAVE
> >                                                            all.q@gridtst2 
> > SLAVE
> > 9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 all.q@gridtst3 
> > MASTER
> >                                                            all.q@gridtst3 
> > SLAVE
> >                                                            all.q@gridtst3 
> > SLAVE
> >                                                            all.q@gridtst3 
> > SLAVE
> 
> What's the output of: qstat -f
> 
> Did you setup any consumable like memory on the nodes with a default 
> consumption?
> 
> - Reuti
> 
> 
> > I'm assuming I am missing something simple :(    What should I be looking 
> > at to help me better understand what's going on?    I do notice that hl:cpu 
> > jumps significantly between idle, dp 12 and dp 24, but I did find anything 
> > in the docs describing what cpu represents...
> >
> > Any help or pointers would be greatly appreciated...
> >
> > I'm running a very old version of grid, but assume that shouldn't matter 
> > (SGE 6.2u5)
> > --
> > -MichaelC
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
> 
> 
> 
> 
> -- 
> -MichaelC


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Help with PE tshooting / config

Reply via email to