Bonnie Xu created YARN-11171:
--------------------------------

             Summary: Help figuring out why high priority jobs are starving and 
low priority jobs not being preempted
                 Key: YARN-11171
                 URL: https://issues.apache.org/jira/browse/YARN-11171
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler
    Affects Versions: 3.2.1
            Reporter: Bonnie Xu


Hi! Recently we've been running into this issue in our production systems where 
a high priority job starves a lower priority job and preemption isn't kicking 
in to rebalance the resources, over the period of 1h. It's to our understanding 
at least that when higher priority jobs show up, resources should be evicted 
from a lower priority queue based on fair share allocations relatively quickly 
based on the fairshare timeout.

 

+*This is for the higher priority queue  (high):*+

!https://paper.dropbox.com/ep/redirect/image?url=https%3A%2F%2Fpaper-attachments.dropbox.com%2Fs_2C3F7CB2982B9542EDF25C7829E4FC9F52683EF3F37B8BF0F033955DB8D447D3_1654287958159_file.png&hmac=Km%2B2JsKoHiuN9ymq2Pz4bcexI%2FdsWDWIkmLdFCfufIg%3D&width=1490|width=738,height=298!

Between 23:30 to 0:45, notice that the higher priority queue consistently 
demands a lot of memory and should be fairly allocated at least half of it, but 
doesn't get its fairshare.

 

+*This is for the lower priority queue (medium):*+

!https://paper.dropbox.com/ep/redirect/image?url=https%3A%2F%2Fpaper-attachments.dropbox.com%2Fs_2C3F7CB2982B9542EDF25C7829E4FC9F52683EF3F37B8BF0F033955DB8D447D3_1654287958172_file.png&hmac=%2F6oGoh1smD9OdcmNXlwrIEudFjTVaofHMetUXfKb2KY%3D&width=1490|width=725,height=250!

Notice that during the same point in time, the medium subqueue is using way 
more than its fairshare. 

One interesting thing (could possibly be related to the issue) is that when 
this happens, the queue is at max resources and we see a lot of these logs:
{code:java}
diagnostics: [Mon May 16 06:29:28 +0000 2022] Application is added to the 
scheduler and is not yet activated.  (Resource request: <memory:27136, 
vCores:4> exceeds current queue or its parents maximum resource allowed). Max 
share of queue: <memory:9223372036854775807, vCores:2147483647> {code}
For this application in particular, it stays like this for a while and then an 
hour later finally ends up getting the resources it needs after the low 
priority job finishes. Note the max share of the queue number is strangely 
really off.

Our current preemption config for this cluster:
{code:java}
<fairSharePreemptionThreshold>1</fairSharePreemptionThreshold>    
<fairSharePreemptionTimeout>900</fairSharePreemptionTimeout>    
<minSharePreemptionTimeout>180</minSharePreemptionTimeout>{code}
{code:java}
       <queue name="low">
        <weight>1</weight>
      </queue>
      <queue name="medium">
        <weight>2</weight>
      </queue>
      <queue name="high">
        <fairSharePreemptionTimeout>300</fairSharePreemptionTimeout>
        <weight>3</weight>
      </queue>
    </queue>{code}
We've tried taking a heap dump and enabling debug logging, and one of our 
theories is that maybe the preemption thread does a check for whether it can 
add resources before preempting, and since the queue is already at max 
resources, this can't go through?

However, nothing super conclusive yet. Would love any assistance/insight you 
could provide. Happy to give more details as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to