I am trying to setup a PE and am struggling to understand how grid
determines how many slots are available for the PE.   I have set up 3 test
machines in a queue.  I set the default slots to 10.  Each system is
actually a virtual machine that has one cpu and ~2G of memory.    PE
definition is:

pe_name            dp
slots              999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE

Since I have 10 slots per host, I assumed I would have 30 slots.   And when
testing I get:

$qrsh -w v -q all.q  -now no -pe dp 30
verification: found possible assignment with 30 slots

$qrsh -w p -q all.q  -now no -pe dp 30
verification: found possible assignment with 30 slots

But when I actually try to run the job the following from qstat...

cannot run in PE "dp" because it only offers 12 slots

I get that other resources can impact the availablity of slots, but I'm
having a hard time figuring out why I'm only getting 12 slots and what
resources are impacting this...

When I request -pd dp 12, it works fine and distributes the jobs across all
three systems...

717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 all.q@gridtst1SLAVE
                                                           all.q@gridtst1SLAVE
                                                           all.q@gridtst1SLAVE
                                                           all.q@gridtst1SLAVE
9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 all.q@gridtst2SLAVE
                                                           all.q@gridtst2SLAVE
                                                           all.q@gridtst2SLAVE
                                                           all.q@gridtst2SLAVE
9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14 all.q@gridtst3MASTER
                                                           all.q@gridtst3SLAVE
                                                           all.q@gridtst3SLAVE
                                                           all.q@gridtst3SLAVE

I'm assuming I am missing something simple :(    What should I be looking
at to help me better understand what's going on?    I do notice that hl:cpu
jumps significantly between idle, dp 12 and dp 24, but I did find anything
in the docs describing what cpu represents...

Any help or pointers would be greatly appreciated...

I'm running a very old version of grid, but assume that shouldn't matter
(SGE 6.2u5)
-- 
-MichaelC
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to