On Thu, Apr 12, 2018 at 10:15:34AM -0700, Joshua Baker-LePain wrote: > We're running SoGE 8.1.9 on a smallish (but growing) cluster. We've > recently added GPU nodes to the cluster. On each GPU node, a consumable > complex named 'gpu' is defined with the number of GPUs in the node. The > complex definition looks like this: > > #name shortcut type relop requestable consumable default > urgency > #-------------------------------------------------------------------------------------- > gpu gpu INT <= YES JOB 0 > 0 > > We're frequently seeing GPU jobs stuck in 'qw' even when slots and resources > on GPU nodes are available. What appears to be happening is that SGE is > choosing a node that's full and then waiting for that node to become > available rather than switching to another node. For example: > > $ qstat -u "*" -q gpu.q > 370002 0.05778 C3D1000b2_ user1 r 04/11/2018 00:18:17 > gpu.q@msg-iogpu10 5 > 369728 0.05778 C3D4000b2_ user1 r 04/10/2018 18:00:24 > gpu.q@msg-iogpu11 5 > 371490 0.06613 class3d user2 r 04/11/2018 20:50:02 > gpu.q@msg-iogpu12 3 > 367554 0.05778 C3D3000b2_ user1 r 04/08/2018 16:07:24 > gpu.q@msg-iogpu3 3 > 367553 0.05778 C3D2000b2_ user1 r 04/08/2018 17:56:54 > gpu.q@msg-iogpu4 3 > 367909 0.05778 C3D11k_b2Y user1 r 04/09/2018 00:04:24 > gpu.q@msg-iogpu8 3 > 371511 0.06613 class3d user2 r 04/11/2018 21:45:02 > gpu.q@msg-iogpu9 3 > 371593 0.95000 refine_joi user3 qw 04/11/2018 23:05:57 > 5 > > Job 371593 has requested '-l gpu=2'. Nodes msg-iogpu2, 5, 6, and 7 have no > jobs in gpu.q on them and avaialable gpu resources, e.g.: > > $ qhost -F -h msg-iogpu2 > . > . > hc:gpu=2.000000 > > However, SGE seems to want to insist on running this job on msg-iogpu9, as > seen by these lines in the messages file for each scheduling run: > > 04/12/2018 09:59:47|worker|wynq1|E|debiting 2.000000 of gpu on host > msg-iogpu9 for 1 slots would exceed remaining capacity of 0.000000 > 04/12/2018 09:59:47|worker|wynq1|E|resources no longer available for start of > job 371593.1
This looks more like the scheduler and qmaster threads of the qmaster disagreeing about the number of gpu left. This shouldn't persist but bouncing the qmaster might get them to agree. It looks like you are defining the gpu as a host consumable. Is there anything else that defines it: Queue consumable, global consumable, resource quota or load sensor? What do you get if you use qstat -F gpu -q 'gpu-q@msg-iogpu[29]' Adjusted as necessary > > From past experience, job 371593 will indeed wait until msg-iogpu9 becomes > available and run there. We do advise our users to set "-R y" for these > jobs -- is this a reservation issue? Where else should I look for clues? Unless some other job has reserved the gpu nodes that appear free then reservation shouldn't come into it as the scheduler won't worry about reservations if it can start the job now. > Any ideas? I'm a bit flummoxed on this one... Set MONITOR=1 in the scheduler's params and have a look at the schedule file should tell you what the scheduler is doing. Also enabling sched_job_info for the job in question and then running qstat -j on it after the next scheduling cycle might provide some clues. William
signature.asc
Description: PGP signature
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users