[ https://issues.apache.org/jira/browse/YARN-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jufeng li updated YARN-10440: ----------------------------- Description: RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. I can open xxxxx:8088/cluster/apps/RUNNING but can not xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new apps can not be submited.just everything hangs but not RM,NM server. How can I fix this?help me,please! here is the log: {code:java} ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= {code} was: RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. I can open xxxxx:8088/cluster/apps/RUNNING but can not xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new apps can not be submited.just everything hangs but not RM,NM server. How can I fix this?help me,please! here is the log: {code:java} //代码占位符 ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition= {code} > resource manager hangs,and i cannot submit any new jobs,but rm and nm > processes are normal > ------------------------------------------------------------------------------------------ > > Key: YARN-10440 > URL: https://issues.apache.org/jira/browse/YARN-10440 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 3.1.1 > Reporter: jufeng li > Priority: Blocker > > RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. > I can open xxxxx:8088/cluster/apps/RUNNING but can not > xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new > apps can not be submited.just everything hangs but not RM,NM server. How can > I fix this?help me,please! > > here is the log: > {code:java} > ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang > clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL > requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler > (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation > proposal > 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - > assignedContainer application attempt=appattempt_1600074574138_66297_000001 > container=null queue=tianqiwang clusterResource=<memory:10240000, > vCores:4800> type=NODE_LOCAL requestedPartition= > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org