[ 
https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271480#comment-15271480
 ] 

Jason Lowe commented on YARN-5039:
----------------------------------

Apps are pending until they are activated.  Apps can be pending despite cluster 
resource availability depending upon how the queues and user limits are 
configured.  Given there's only one app running and one app pending, it's 
acting like the queue is only allowing one active app at a time.  If you go to 
the RM scheduler page and expand the details for the queue (click the triangle 
to the left of the queue bar) then it will show pertinent details like Max 
Application Master Resources, Used Application Master Resources, Max 
Application Master Resources Per User, Num Schedulable Applications, Num 
Non-Schedulable Applications, etc.  If it shows a non-zero number of 
non-schedulable applications then that's likely because adding the app would 
exceed the maximum application master resource limit for the user or queue.  
The RM log would contain lines like "not starting application as amIfStarted 
exceeds amLimit" when that occurs.

Either way if the spread of containers across those nodes is not expected it 
would be good to get some debug logs for the LeafQueue.

> Applications ACCEPTED but not starting
> --------------------------------------
>
>                 Key: YARN-5039
>                 URL: https://issues.apache.org/jira/browse/YARN-5039
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.2
>            Reporter: Miles Crawford
>         Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, 
> yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized cluster, they 
> sit, unable to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the 
> resourcemanger logs show that scheduling is being skipped. The scheduling is 
> skipped because the application itself has reserved the node? I'm not sure 
> how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource=<memory:50688, vCores:1> 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, 
> vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource=<memory:50688, vCores:1> 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, 
> vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-53.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_000001
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource=<memory:50688, vCores:1> 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:1894464, vCores:33>, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used=<memory:1894464, 
> vCores:33> cluster=<memory:2658304, vCores:704>
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_000001
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to