Re: [gridengine users] Help with PE tshooting / config

Reuti Sat, 12 Apr 2014 15:48:23 -0700

Am 11.04.2014 um 19:49 schrieb Michael Coffman:

> On Fri, Apr 11, 2014 at 9:41 AM, Reuti <[email protected]> wrote:
> Am 11.04.2014 um 17:28 schrieb Michael Coffman:
> <snip>
> The queue configuration?
> 
> Woops... Sorry.
> 
>  qname                 all.q                                                  
>    
> <snip>


Ok. Is there a "job_load_adjustments" in the scheduler configuration?

-- Reuti


> > $ qconf -srqsl
> > no resource quota set list defined
> 
> Good.
> 
> 
> > > slots                 s                 INT         <=    YES         YES 
> > >        1        1000
> > >
> > >
> > >
> > > On Thu, Apr 10, 2014 at 4:08 PM, Reuti <[email protected]> wrote:
> > > Am 10.04.2014 um 23:51 schrieb Michael Coffman:
> > >
> > > > I am trying to setup a PE and am struggling to understand how grid 
> > > > determines how many slots are available for the PE.   I have set up 3 
> > > > test machines in a queue.  I set the default slots to 10.  Each system 
> > > > is actually a virtual machine that has one cpu and ~2G of memory.    PE 
> > > > definition is:
> > > >
> > > > pe_name            dp
> > > > slots              999
> > > > user_lists         NONE
> > > > xuser_lists        NONE
> > > > start_proc_args    /bin/true
> > > > stop_proc_args     /bin/true
> > > > allocation_rule    $fill_up
> > > > control_slaves     FALSE
> > > > job_is_first_task  TRUE
> > > > urgency_slots      min
> > > > accounting_summary FALSE
> > > >
> > > > Since I have 10 slots per host, I assumed I would have 30 slots.   And 
> > > > when testing I get:
> > > >
> > > > $qrsh -w v -q all.q  -now no -pe dp 30
> > > > verification: found possible assignment with 30 slots
> > > >
> > > > $qrsh -w p -q all.q  -now no -pe dp 30
> > > > verification: found possible assignment with 30 slots
> > > >
> > > > But when I actually try to run the job the following from qstat...
> > > >
> > > > cannot run in PE "dp" because it only offers 12 slots
> > > >
> > > > I get that other resources can impact the availablity of slots, but I'm 
> > > > having a hard time figuring out why I'm only getting 12 slots and what 
> > > > resources are impacting this...
> > > >
> > > > When I request -pd dp 12, it works fine and distributes the jobs across 
> > > > all three systems...
> > > >
> > > > 717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 
> > > > all.q@gridtst1 SLAVE
> > > >                                                            
> > > > all.q@gridtst1 SLAVE
> > > >                                                            
> > > > all.q@gridtst1 SLAVE
> > > >                                                            
> > > > all.q@gridtst1 SLAVE
> > > > 9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 
> > > > all.q@gridtst2 SLAVE
> > > >                                                            
> > > > all.q@gridtst2 SLAVE
> > > >                                                            
> > > > all.q@gridtst2 SLAVE
> > > >                                                            
> > > > all.q@gridtst2 SLAVE
> > > > 9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 
> > > > all.q@gridtst3 MASTER
> > > >                                                            
> > > > all.q@gridtst3 SLAVE
> > > >                                                            
> > > > all.q@gridtst3 SLAVE
> > > >                                                            
> > > > all.q@gridtst3 SLAVE
> > >
> > > What's the output of: qstat -f
> > >
> > > Did you setup any consumable like memory on the nodes with a default 
> > > consumption?
> > >
> > > - Reuti
> > >
> > >
> > > > I'm assuming I am missing something simple :(    What should I be 
> > > > looking at to help me better understand what's going on?    I do notice 
> > > > that hl:cpu jumps significantly between idle, dp 12 and dp 24, but I 
> > > > did find anything in the docs describing what cpu represents...
> 
> /usr/sge/doc/load_parameters.asc
> 
> It's % load.
> 
> 
> Ahh.  Thanks for the pointer to the file.   Very useful.
>  
> -- Reuti
> 
> 
> 
> -- 
> -MichaelC


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Help with PE tshooting / config

Reply via email to