On Fri, Apr 11, 2014 at 9:41 AM, Reuti <[email protected]> wrote:
> Am 11.04.2014 um 17:28 schrieb Michael Coffman: > > > On Fri, Apr 11, 2014 at 8:26 AM, Reuti <[email protected]> > wrote: > > Am 11.04.2014 um 16:14 schrieb Michael Coffman: > > > > > Thanks for the respons! > > > > > > $ qstat -f > > > queuename qtype resv/used/tot. load_avg arch > states > > > > --------------------------------------------------------------------------------- > > > fast.q@gridtst1 BIP 0/0/8 0.00 lx24-amd64 > > > > --------------------------------------------------------------------------------- > > > fast.q@gridtst2 BIP 0/0/8 0.00 lx24-amd64 > > > > --------------------------------------------------------------------------------- > > > fast.q@gridtst3 BIP 0/0/8 0.00 lx24-amd64 > > > > --------------------------------------------------------------------------------- > > > all.q@gridtst1 BIP 0/0/10 0.00 lx24-amd64 > > > > --------------------------------------------------------------------------------- > > > all.q@gridtst2 BIP 0/0/10 0.00 lx24-amd64 > > > > --------------------------------------------------------------------------------- > > > all.q@gridtst3 BIP 0/0/10 0.00 lx24-amd64 > > > > > > > > > I have some custom complexes, but set them to 0 for the defaults for > testing. This command > > > seemed like the best way to see what was being submitted with default > values.. > > > > > > $ qconf -sc | awk '$7~/1/ {print}' > > > > qconf -sc | awk '$7!~/(0|NONE)/ {print}'' > > > > 128G 128G BOOL == FORCED NO > FALSE 0 > > 64G 64G BOOL == FORCED NO > FALSE 0 > > QUALIFICATION qualify BOOL == FORCED NO > FALSE 0 > > critical troll BOOL == YES NO > FALSE 1000000 > > dedicate d BOOL == FORCED NO > FALSE 0 > > hspice spc INT <= YES > YES FALSE 5000 > > The above one looks strange, but it seems working. > > > > redhawk red BOOL == FORCED NO > FALSE 0 > > short short BOOL == FORCED NO > FALSE 100000 > > slots s INT <= YES > YES 1 1000 > > top top_resvd BOOL == FORCED NO > FALSE 0 > > unthrottled djw BOOL == YES NO > FALSE 0 > > > > > > > > So, there is nothing runninfg right now and no reservation made. Can you > please post the queue configuration. Any limit of slots on RQS or on a > global level (`qconf -se global`)? > > > > Nothing running and nothing queued besides the pe job I submitted.. > > > > $ qstat -u '*' > > job-ID prior name user state submit/start at queue > slots ja-task-ID > > > ----------------------------------------------------------------------------------------------------------------- > > 9724 0.65000 QRLOGIN coffman qw 04/11/2014 09:22:07 > 24 > > The queue configuration? > > Woops... Sorry. qname all.q hostlist @allhosts seq_no 100 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list make dp rerun FALSE slots 10 tmpdir /tmp shell /bin/bash prolog /opt/grid/ftcrnd/common/apd-prolog epilog NONE shell_start_mode posix_compliant starter_method /opt/grid/ftcrnd/common/apd-starter_method suspend_method SIGKILL resume_method NONE terminate_method SIGTERM notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY > > $ qconf -srqsl > > no resource quota set list defined > > Good. > > > > > slots s INT <= YES > YES 1 1000 > > > > > > > > > > > > On Thu, Apr 10, 2014 at 4:08 PM, Reuti <[email protected]> > wrote: > > > Am 10.04.2014 um 23:51 schrieb Michael Coffman: > > > > > > > I am trying to setup a PE and am struggling to understand how grid > determines how many slots are available for the PE. I have set up 3 test > machines in a queue. I set the default slots to 10. Each system is > actually a virtual machine that has one cpu and ~2G of memory. PE > definition is: > > > > > > > > pe_name dp > > > > slots 999 > > > > user_lists NONE > > > > xuser_lists NONE > > > > start_proc_args /bin/true > > > > stop_proc_args /bin/true > > > > allocation_rule $fill_up > > > > control_slaves FALSE > > > > job_is_first_task TRUE > > > > urgency_slots min > > > > accounting_summary FALSE > > > > > > > > Since I have 10 slots per host, I assumed I would have 30 slots. > And when testing I get: > > > > > > > > $qrsh -w v -q all.q -now no -pe dp 30 > > > > verification: found possible assignment with 30 slots > > > > > > > > $qrsh -w p -q all.q -now no -pe dp 30 > > > > verification: found possible assignment with 30 slots > > > > > > > > But when I actually try to run the job the following from qstat... > > > > > > > > cannot run in PE "dp" because it only offers 12 slots > > > > > > > > I get that other resources can impact the availablity of slots, but > I'm having a hard time figuring out why I'm only getting 12 slots and what > resources are impacting this... > > > > > > > > When I request -pd dp 12, it works fine and distributes the jobs > across all three systems... > > > > > > > > 717 0.65000 QRLOGIN user r 04/10/2014 14:40:14 > all.q@gridtst1 SLAVE > > > > > all.q@gridtst1 SLAVE > > > > > all.q@gridtst1 SLAVE > > > > > all.q@gridtst1 SLAVE > > > > 9717 0.65000 QRLOGIN user r 04/10/2014 14:40:14 > all.q@gridtst2 SLAVE > > > > > all.q@gridtst2 SLAVE > > > > > all.q@gridtst2 SLAVE > > > > > all.q@gridtst2 SLAVE > > > > 9717 0.65000 QRLOGIN user r 04/10/2014 14:40:14 > all.q@gridtst3 MASTER > > > > > all.q@gridtst3 SLAVE > > > > > all.q@gridtst3 SLAVE > > > > > all.q@gridtst3 SLAVE > > > > > > What's the output of: qstat -f > > > > > > Did you setup any consumable like memory on the nodes with a default > consumption? > > > > > > - Reuti > > > > > > > > > > I'm assuming I am missing something simple :( What should I be > looking at to help me better understand what's going on? I do notice > that hl:cpu jumps significantly between idle, dp 12 and dp 24, but I did > find anything in the docs describing what cpu represents... > > /usr/sge/doc/load_parameters.asc > > It's % load. > > Ahh. Thanks for the pointer to the file. Very useful. > -- Reuti -- -MichaelC
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
