Re: [gridengine users] load_thresholds, load_scaling, and hyperthreading

Joshua Baker-LePain Thu, 03 Nov 2016 12:05:26 -0700

On Wed, 2 Nov 2016 at 5:05pm, Reuti wrote

Just for the record: to investigate this, I defined a load_thresholdswhich is always putting the queue in alarm state besides the one undertest. I used our tmpfree complex for it and entered a value which isbeyond the installed disk. This way, `qstat -explain a` will always givean output, even the values of other complexes which aren't bypassed aredisplayed. I got:


$ qstat -explain a -q serial@node29 -s r
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
serial@node29                  B     0/0/16         15.75    lx24-em64t    a
        alarm hl:tmpfree=1842222120k load-threshold=2T
        alarm hl:np_load_avg=0.492188 load-threshold=0.5

$ qstat -explain a -q serial@node29 -s r
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
serial@node29                  B     0/0/16         15.75    lx24-em64t    a
        alarm hl:tmpfree=1842222120k load-threshold=2T
        alarm hl:np_load_avg=   9.844 load-threshold=0.5

$ qstat -explain a -q serial@node29 -s r
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
serial@node29                  B     0/0/16         15.76    lx24-em64t    a
        alarm hl:tmpfree=1842221988k load-threshold=2T
        alarm hl:np_load_avg=   0.246 load-threshold=0.5

for settings of NONE or 20 and 0.5 in the load_scaling of np_load_avg ofthe exechost. Looks fine. Hence your np_load_avg=2 should have worked.

The plot thickens. Doing similar testing to yours, it looks like this isa display bug with qhost. Here are 2 configurations that both create analarm state, but in one the alarm doesn't show up in the output of'qhost':


Config 1:
$ qconf -sq long.q
load_thresholds       np_load_avg=0.5
$ qconf -se msg-id1
load_scaling          NONE
$ qhost -q -h msg-id1
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  
SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
msg-id1                 lx-amd64       48    2   24   48 24.63  251.6G    2.2G  
  4.0G     0.0
   member.q             BP    0/24/24
   short.q              BP    0/0/24
   long.q               BP    0/0/24        a
$ qstat -explain a -q long.q@msg-id1 -s r
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
lon...@msg-id1.ic.ucsf.edu     BP    0/0/24         24.62    lx-amd64      a
        alarm hl:np_load_avg=0.512917 load-threshold=0.5

Config 2:
$ qconf -sq long.q
load_thresholds       np_load_avg=0.9
$ qconf -se msg-id1
load_scaling          np_load_avg=2.000000
$ qhost -q -h msg-id1
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  
SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
msg-id1                 lx-amd64       48    2   24   48 24.58  251.6G    2.2G  
  4.0G     0.0
   member.q             BP    0/24/24
   short.q              BP    0/0/24
   long.q               BP    0/0/24
$ qstat -explain a -q long.q@msg-id1 -s r
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
lon...@msg-id1.ic.ucsf.edu     BP    0/0/24         24.58    lx-amd64      a
        alarm hl:np_load_avg=   1.024 load-threshold=0.9

In both configs, long.q correctly refuses to accept jobs. But the qhostdisplay error is sure to confuse users as to why that is. I'm going tostick with the previous solution, but I'll file a bug to try to get thingsfixed up.


Thanks again for all your help, Reuti.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] load_thresholds, load_scaling, and hyperthreading

Reply via email to