[ 
https://issues.apache.org/jira/browse/YARN-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jufeng li updated YARN-10440:
-----------------------------
    Description: 
RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. I 
can open  xxxxx:8088/cluster/apps/RUNNING but can not 
xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new 
apps can not be submited.just everything hangs but not RM,NM server. How can I 
fix this?help me,please!

 

here is the log:
{code:java}
ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang 
clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL 
requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
{code}

  was:
RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. I 
can open  xxxxx:8088/cluster/apps/RUNNING but can not 
xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new 
apps can not be submited.just everything hangs but not RM,NM server. How can I 
fix this?help me,please!

 

here is the log:
{code:java}
//代码占位符
ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang 
clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL 
requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
assignedContainer application attempt=appattempt_1600074574138_66297_000001 
container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> 
type=NODE_LOCAL requestedPartition=
{code}


> resource manager hangs,and i cannot submit any new jobs,but rm and nm 
> processes are normal
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-10440
>                 URL: https://issues.apache.org/jira/browse/YARN-10440
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.1
>            Reporter: jufeng li
>            Priority: Blocker
>
> RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. 
> I can open  xxxxx:8088/cluster/apps/RUNNING but can not 
> xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new 
> apps can not be submited.just everything hangs but not RM,NM server. How can 
> I fix this?help me,please!
>  
> here is the log:
> {code:java}
> ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang 
> clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL 
> requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_000001 
> container=null queue=tianqiwang clusterResource=<memory:10240000, 
> vCores:4800> type=NODE_LOCAL requestedPartition=
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to