n Fri, 13 Apr 2018 at 1:48am, William Hay wrote

This looks more like the scheduler and qmaster threads of the qmaster disagreeing about the number of gpu left. This shouldn't persist but bouncing the qmaster might get them to agree.


That is indeed exactly what it seems like is going on. However I've tried bouncing the qmaster, and the problem persists after the restart.

It looks like you are defining the gpu as a host consumable. Is there anything else that defines it: Queue consumable, global consumable, resource quota or load sensor?

AFAIK, no. They only place the "gpu" variable occurs is in the host definition "complex_values", e.g.:

$ qconf -se msg-iogpu9
hostname              msg-iogpu9
load_scaling          NONE
complex_values        mem_free=128000M,gpu=2
load_values           arch=lx-amd64,num_proc=32,mem_total=128739.226562M, \
                      swap_total=4095.996094M,virtual_total=132835.222656M, \
                      
m_topology=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT, \
                      m_socket=2,m_core=16,m_thread=32,load_avg=5.570000, \
                      load_short=5.560000,load_medium=5.570000, \
                      load_long=5.530000,mem_free=124723.488281M, \
                      swap_free=4095.996094M,virtual_free=128819.484375M, \
                      mem_used=4015.738281M,swap_used=0.000000M, \
                      virtual_used=4015.738281M,cpu=21.100000, \
                      
m_topology_inuse=SCTTCTTCTTCTTCTTCTTCTTCTTSCTTCTTCTTCTTCTTCTTCTTCTT, \
                      gpu.ncuda=2,gpu.ndev=2,gpu.cuda.0.mem_free=752222208, \
                      gpu.cuda.0.procs=1,gpu.cuda.0.clock=1911, \
                      gpu.cuda.0.util=91,gpu.cuda.1.mem_free=752222208, \
                      gpu.cuda.1.procs=1,gpu.cuda.1.clock=1911, \
                      gpu.cuda.1.util=90,gpu.names=GeForce GTX 1080 Ti;GeForce \
                      GTX 1080 Ti;,np_load_avg=0.174063, \
                      np_load_short=0.173750,np_load_medium=0.174063, \
                      np_load_long=0.172813
processors            32
user_lists            NONE
xuser_lists           NONE
projects              NONE
xprojects             NONE
usage_scaling         NONE
report_variables      NONE

and the complex definition I sent previously. I am running the bundled load sensor, as can be seen above.

What do you get if you use
qstat -F gpu -q 'gpu-q@msg-iogpu[29]'

$ qstat -F gpu -q gpu.q@msg-iogpu9
queuename                      qtype resv/used/tot. load_avg arch          
states
---------------------------------------------------------------------------------
gpu.q@msg-iogpu9               BP    0/3/16         5.51     lx-amd64
        hc:gpu=0

Any ideas?  I'm a bit flummoxed on this one...

Set MONITOR=1 in the scheduler's params and have a look at the schedule file should tell you what the scheduler is doing.

I've had that set for a while now. JobID 373163 is currently stuck in the queue with appropriate slots avaiable:

$ qstat -u "*" -q gpu.q
.
.
 373163 0.50000 refine_joi user1       qw    04/13/2018 09:25:08                
                    3

$ qalter -w p 373163
verification: found possible assignment with 3 slots

The qmaster "messages" file has this to say about that job ID (repeatedly -- this is the first mention and coincides with the submit time above):
04/13/2018 09:25:32|worker|wynq1|E|debiting 2.000000 of gpu on host msg-iogpu9 
for 1 slots would exceed remaining capacity of 0.000000
04/13/2018 09:25:32|worker|wynq1|E|resources no longer available for start of 
job 373163.1

And this is that job's first mention in the schedule file:
373163:1:STARTING:1523638696:82860:P:mpi_onehost:slots:3.000000
373163:1:STARTING:1523638696:82860:H:msg-iogpu9:mem_free:51539607552.000000
373163:1:STARTING:1523638696:82860:H:msg-iogpu9:gpu:2.000000
373163:1:STARTING:1523638696:82860:Q:gpu.q@msg-iogpu9:slots:3.000000

That block is repeated over and over in that file.

Also enabling sched_job_info for the job in question and then running qstat -j on it after the next scheduling cycle might provide some clues.

Unfortunately that seems a bust as well. It just details all the queue instances the job can't run in (all legitimate). It doesn't mention the queue instances it *can* run in at all.

So the short version again is some part of the scheduler seems to think there are available "gpu" complex slots on hosts where there aren't, and another part of the scheduler realizes this and keeps the jobs requesting those slots from starting. But it also won't try different hosts.

--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to