On Thu, Apr 12, 2018 at 10:15:34AM -0700, Joshua Baker-LePain wrote:
> We're running SoGE 8.1.9 on a smallish (but growing) cluster.  We've
> recently added GPU nodes to the cluster.  On each GPU node, a consumable
> complex named 'gpu' is defined with the number of GPUs in the node.  The
> complex definition looks like this:
> 
> #name               shortcut   type      relop requestable consumable default 
>  urgency
> #--------------------------------------------------------------------------------------
> gpu                 gpu        INT       <=    YES         JOB        0       
>  0
> 
> We're frequently seeing GPU jobs stuck in 'qw' even when slots and resources
> on GPU nodes are available.  What appears to be happening is that SGE is
> choosing a node that's full and then waiting for that node to become
> available rather than switching to another node.  For example:
> 
> $ qstat -u "*" -q gpu.q
>  370002 0.05778 C3D1000b2_ user1        r     04/11/2018 00:18:17 
> gpu.q@msg-iogpu10                  5
>  369728 0.05778 C3D4000b2_ user1        r     04/10/2018 18:00:24 
> gpu.q@msg-iogpu11                  5
>  371490 0.06613 class3d    user2        r     04/11/2018 20:50:02 
> gpu.q@msg-iogpu12                  3
>  367554 0.05778 C3D3000b2_ user1        r     04/08/2018 16:07:24 
> gpu.q@msg-iogpu3                   3
>  367553 0.05778 C3D2000b2_ user1        r     04/08/2018 17:56:54 
> gpu.q@msg-iogpu4                   3
>  367909 0.05778 C3D11k_b2Y user1        r     04/09/2018 00:04:24 
> gpu.q@msg-iogpu8                   3
>  371511 0.06613 class3d    user2        r     04/11/2018 21:45:02 
> gpu.q@msg-iogpu9                   3
>  371593 0.95000 refine_joi user3        qw    04/11/2018 23:05:57             
>                        5
> 
> Job 371593 has requested '-l gpu=2'.  Nodes msg-iogpu2, 5, 6, and 7 have no
> jobs in gpu.q on them and avaialable gpu resources, e.g.:
> 
> $ qhost -F -h msg-iogpu2
> .
> .
>    hc:gpu=2.000000
> 
> However, SGE seems to want to insist on running this job on msg-iogpu9, as
> seen by these lines in the messages file for each scheduling run:
> 
> 04/12/2018 09:59:47|worker|wynq1|E|debiting 2.000000 of gpu on host 
> msg-iogpu9 for 1 slots would exceed remaining capacity of 0.000000
> 04/12/2018 09:59:47|worker|wynq1|E|resources no longer available for start of 
> job 371593.1

This looks more like the scheduler and qmaster threads of the qmaster 
disagreeing about the number of gpu left.  This shouldn't persist but bouncing 
the qmaster
might get them to agree.

It looks like you are defining the gpu as a host consumable.  Is there anything 
else that defines it:  Queue consumable, global consumable, resource quota or
load sensor?

What do you get if you use 
qstat -F gpu -q 'gpu-q@msg-iogpu[29]'

Adjusted as necessary

> 
> From past experience, job 371593 will indeed wait until msg-iogpu9 becomes
> available and run there.  We do advise our users to set "-R y" for these
> jobs -- is this a reservation issue?  Where else should I look for clues?

Unless some other job has reserved the gpu nodes that appear free then 
reservation shouldn't come into it
as the scheduler won't worry about reservations if it can start the job now.

> Any ideas?  I'm a bit flummoxed on this one...

Set MONITOR=1 in the scheduler's params and have a look at the schedule file 
should tell you what the scheduler is doing.

Also enabling sched_job_info for the job in question and then running qstat -j 
on it after the next scheduling cycle might provide some clues.

William

Attachment: signature.asc
Description: PGP signature

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to