[ https://issues.apache.org/jira/browse/YARN-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492405#comment-14492405 ]
Peng Zhang commented on YARN-3414: ---------------------------------- It has the same root cause like YARN-3405. > FairScheduler's preemption may cause livelock > --------------------------------------------- > > Key: YARN-3414 > URL: https://issues.apache.org/jira/browse/YARN-3414 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.6.0 > Reporter: Peng Zhang > > I met this problem in our cluster, it cause livelock during preemption and > scheduling. > Queue hierarchy described as below: > {noformat} > root > / | \ > queue-1 queue-2 queue-3 > / \ > queue-1-1 queue-1-2 > {noformat} > # Assume cluster resource is 100G in memory > # Assume queue-1 has max resource limit 20G > # queue-1-1 is active and it will get max 20G memory(equal to its fairshare) > # queue-2 is active then, and it require 30G memory(less than its fairshare) > # queue-3 is active, and it can be assigned with all other resources, 50G > memory(larger than its fairshare). At here three queues' fair share is (20, > 40, 40), and usage is (20, 30, 50) > # queue-1-2 is active, it will cause new preemption request(10G memory and > intuitively it can only preempt from its sibling queue-1-1) > # Actually preemption starts from root, and it will find queue-3 is most over > fairshare, and preempt some resources form queue-3. > # But during scheduling, it will find queue-1 itself arrived it's max > fairshare, and cannot assign resource to it. Then resource's again assigned > to queue-3 > And then it repeats between last two steps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)