Bibin A Chundatt created YARN-4140: -------------------------------------- Summary: RM container allocation delated incase of app submitted to Nodel partition Key: YARN-4140 URL: https://issues.apache.org/jira/browse/YARN-4140 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt
Trying to run application on Nodelabel partition I found that the application execution time is delayed by 5 – 10 min for 500 containers . Total 3 machines 2 machines were in same partition and app submitted to same. After enabling debug was able to find the below # From AM the container ask is for OFF-SWITCH # RM allocating all containers to NODE_LOCAL as shown in logs below. # So since I was having about 500 containers time taken was about – 6 minutes to allocated map after AM allocation. #Tested with about 1K maps with PI job took 17 minutes to allocated the next container after AM allocation Once 500 container allocation on NODE_LOCAL is done the next container allocation is done on OFF_SWITCH {code} 2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: /default-rack, Relax Locality: true, Node Label Expression: } 2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: *, Relax Locality: true, Node Label Expression: 3} 2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: host-10-19-92-143, Relax Locality: true, Node Label Expression: } 2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: host-10-19-92-117, Relax Locality: true, Node Label Expression: } 2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL {code} {code} 2015-09-09 14:35:45,467 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL 2015-09-09 14:35:45,831 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL 2015-09-09 14:35:46,469 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL 2015-09-09 14:35:46,832 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL {code} {code} dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1> cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep "root.b.b1" | wc -l 500 {code} (Consumes about 6 minutes) -- This message was sent by Atlassian JIRA (v6.3.4#6332)