[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-3849: Fix Version/s: 2.6.4 Back ported to 2.6.4. Thanks [~sunilg] for providing branch-2.6 patch Run test cases with and without patch to ensure test cases are healthy. > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0, 2.7.3, 2.6.4 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849-branch2-6.patch, > 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: (was: 0004-YARN-3849-branch2-7.patch) > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849-branch2-6.patch, > 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: 0004-YARN-3849-branch2-6.patch > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849-branch2-6.patch, > 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: 0004-YARN-3849-branch2-7.patch Attaching branch2.6 patch. Locally all test cases were passing. [~djp]/[~rohithsharma] Could you please take a look. > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, > 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3849: - Target Version/s: 2.6.4 > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3849: - Fix Version/s: 2.7.3 > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3849: - Target Version/s: (was: 2.7.3) > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: 0004-YARN-3849-branch2-7.patch Attaching a branch2.7 patch. Locally the test cases are passing. > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3849: - Target Version/s: 2.7.3 Discussed with [~vinodkv], since 2.7.2 is almost done, set target version of this ticket to 2.7.3. > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3849: - Labels: (was: 2.7.2-candidate) > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3849: - Labels: 2.7.2-candidate (was: ) > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Labels: 2.7.2-candidate > Fix For: 2.8.0 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: 0004-YARN-3849.patch Kicking jenkins again. > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: (was: 0004-YARN-3849.patch) > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: 0004-YARN-3849.patch Yes [~leftnoteasy] . You are correct, thanks for pointing out. I update the patch. :) > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: 0003-YARN-3849.patch Thank you [~leftnoteasy] for the comments. Uploading a patch addressing the issues. Regarding one comment, bq.testPreemptionWithVCoreResource seems not correct, root.used != A.used + b.used {noformat} "root(=[100:200 100:200 100:200 100:200],x=[100:200 100:200 100:200 100:200]);" "-a(=[50:100 100:200 20:40 50:100],x=[50:100 100:200 80:160 50:100]);" + // a "-b(=[50:100 100:200 80:160 50:100],x=[50:100 100:200 20:40 50:100])"; {noformat} Here now root.used = a.used+b.used. Please help to check. > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: 0002-YARN-3849.patch Thank you [~leftnoteasy] for the comments. I have uploaded a patch by addressing the comments. Kindly check. > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3849: -- Attachment: 0001-YARN-3849.patch Hi [~leftnoteasy], [~rohithsharma] Uploading an initial patch. I have changed TestProportionalCapacityPreemptionPolicy test framework to accommodate Vcores along with memory. Corrected few test cases also. Kindly share your opinion. > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3849: --- Component/s: (was: resourcemanager) capacityscheduler > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)