On Fri, Apr 11, 2014 at 9:41 AM, Reuti <[email protected]> wrote:

> Am 11.04.2014 um 17:28 schrieb Michael Coffman:
>
> > On Fri, Apr 11, 2014 at 8:26 AM, Reuti <[email protected]>
> wrote:
> > Am 11.04.2014 um 16:14 schrieb Michael Coffman:
> >
> > > Thanks for the respons!
> > >
> > > $ qstat -f
> > > queuename                      qtype resv/used/tot. load_avg arch
>      states
> > >
> ---------------------------------------------------------------------------------
> > > fast.q@gridtst1 BIP   0/0/8          0.00     lx24-amd64
> > >
> ---------------------------------------------------------------------------------
> > > fast.q@gridtst2 BIP   0/0/8          0.00     lx24-amd64
> > >
> ---------------------------------------------------------------------------------
> > > fast.q@gridtst3 BIP   0/0/8          0.00     lx24-amd64
> > >
> ---------------------------------------------------------------------------------
> > > all.q@gridtst1 BIP   0/0/10         0.00     lx24-amd64
> > >
> ---------------------------------------------------------------------------------
> > > all.q@gridtst2 BIP   0/0/10         0.00     lx24-amd64
> > >
> ---------------------------------------------------------------------------------
> > > all.q@gridtst3 BIP   0/0/10         0.00     lx24-amd64
> > >
> > >
> > > I have some custom complexes, but set them to 0 for the defaults for
> testing.   This command
> > > seemed like the best way to see what was being submitted with default
> values..
> > >
> > > $ qconf -sc | awk '$7~/1/ {print}'
> >
> > qconf -sc | awk '$7!~/(0|NONE)/ {print}''
> >
> > 128G                  128G              BOOL        ==    FORCED      NO
>         FALSE    0
> > 64G                   64G               BOOL        ==    FORCED      NO
>         FALSE    0
> > QUALIFICATION         qualify           BOOL        ==    FORCED      NO
>         FALSE    0
> > critical              troll             BOOL        ==    YES         NO
>         FALSE    1000000
> > dedicate              d                 BOOL        ==    FORCED      NO
>         FALSE    0
> > hspice                spc               INT         <=    YES
> YES        FALSE    5000
>
> The above one looks strange, but it seems working.
>
>
> > redhawk               red               BOOL        ==    FORCED      NO
>         FALSE    0
> > short                 short             BOOL        ==    FORCED      NO
>         FALSE    100000
> > slots                 s                 INT         <=    YES
> YES        1        1000
> > top                   top_resvd         BOOL        ==    FORCED      NO
>         FALSE    0
> > unthrottled           djw               BOOL        ==    YES         NO
>         FALSE    0
> >
> >
> >
> > So, there is nothing runninfg right now and no reservation made. Can you
> please post the queue configuration. Any limit of slots on RQS or on a
> global level (`qconf -se global`)?
> >
> > Nothing running and nothing queued besides the pe job I submitted..
> >
> > $ qstat -u '*'
> > job-ID  prior   name       user         state submit/start at     queue
>                          slots ja-task-ID
> >
> -----------------------------------------------------------------------------------------------------------------
> >    9724 0.65000 QRLOGIN    coffman      qw    04/11/2014 09:22:07
>                             24
>
> The queue configuration?
>
> Woops... Sorry.

 qname
all.q
hostlist              @allhosts
seq_no                100
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make dp
rerun                 FALSE
slots                 10
tmpdir                /tmp
shell                 /bin/bash
prolog                /opt/grid/ftcrnd/common/apd-prolog
epilog                NONE
shell_start_mode      posix_compliant
starter_method        /opt/grid/ftcrnd/common/apd-starter_method
suspend_method        SIGKILL
resume_method         NONE
terminate_method      SIGTERM
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


> > $ qconf -srqsl
> > no resource quota set list defined
>
> Good.
>
>
> > > slots                 s                 INT         <=    YES
> YES        1        1000
> > >
> > >
> > >
> > > On Thu, Apr 10, 2014 at 4:08 PM, Reuti <[email protected]>
> wrote:
> > > Am 10.04.2014 um 23:51 schrieb Michael Coffman:
> > >
> > > > I am trying to setup a PE and am struggling to understand how grid
> determines how many slots are available for the PE.   I have set up 3 test
> machines in a queue.  I set the default slots to 10.  Each system is
> actually a virtual machine that has one cpu and ~2G of memory.    PE
> definition is:
> > > >
> > > > pe_name            dp
> > > > slots              999
> > > > user_lists         NONE
> > > > xuser_lists        NONE
> > > > start_proc_args    /bin/true
> > > > stop_proc_args     /bin/true
> > > > allocation_rule    $fill_up
> > > > control_slaves     FALSE
> > > > job_is_first_task  TRUE
> > > > urgency_slots      min
> > > > accounting_summary FALSE
> > > >
> > > > Since I have 10 slots per host, I assumed I would have 30 slots.
> And when testing I get:
> > > >
> > > > $qrsh -w v -q all.q  -now no -pe dp 30
> > > > verification: found possible assignment with 30 slots
> > > >
> > > > $qrsh -w p -q all.q  -now no -pe dp 30
> > > > verification: found possible assignment with 30 slots
> > > >
> > > > But when I actually try to run the job the following from qstat...
> > > >
> > > > cannot run in PE "dp" because it only offers 12 slots
> > > >
> > > > I get that other resources can impact the availablity of slots, but
> I'm having a hard time figuring out why I'm only getting 12 slots and what
> resources are impacting this...
> > > >
> > > > When I request -pd dp 12, it works fine and distributes the jobs
> across all three systems...
> > > >
> > > > 717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14
> all.q@gridtst1 SLAVE
> > > >
>  all.q@gridtst1 SLAVE
> > > >
>  all.q@gridtst1 SLAVE
> > > >
>  all.q@gridtst1 SLAVE
> > > > 9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14
> all.q@gridtst2 SLAVE
> > > >
>  all.q@gridtst2 SLAVE
> > > >
>  all.q@gridtst2 SLAVE
> > > >
>  all.q@gridtst2 SLAVE
> > > > 9717 0.65000 QRLOGIN    user      r    04/10/2014 14:40:14
> all.q@gridtst3 MASTER
> > > >
>  all.q@gridtst3 SLAVE
> > > >
>  all.q@gridtst3 SLAVE
> > > >
>  all.q@gridtst3 SLAVE
> > >
> > > What's the output of: qstat -f
> > >
> > > Did you setup any consumable like memory on the nodes with a default
> consumption?
> > >
> > > - Reuti
> > >
> > >
> > > > I'm assuming I am missing something simple :(    What should I be
> looking at to help me better understand what's going on?    I do notice
> that hl:cpu jumps significantly between idle, dp 12 and dp 24, but I did
> find anything in the docs describing what cpu represents...
>
> /usr/sge/doc/load_parameters.asc
>
> It's % load.
>
>
Ahh.  Thanks for the pointer to the file.   Very useful.


> -- Reuti




-- 
-MichaelC
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to