[
https://issues.apache.org/jira/browse/YARN-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029764#comment-18029764
]
lvyankui commented on YARN-10440:
---------------------------------
[~Jufeng] Do you solve the problem, I meet it too
> resource manager hangs,and i cannot submit any new jobs,but rm and nm
> processes are normal
> ------------------------------------------------------------------------------------------
>
> Key: YARN-10440
> URL: https://issues.apache.org/jira/browse/YARN-10440
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.1.1
> Reporter: jufeng li
> Priority: Blocker
> Attachments: RM_normal_state.stack, RM_unnormal_state.stack
>
>
> RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal.
> I can open xxxxx:8088/cluster/apps/RUNNING but can not
> xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new
> apps can not be submited.just everything hangs but not RM,NM server. How can
> I fix this?help me,please!
>
> here is the log:
> {code:java}
> ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang
> clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL
> requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation
> proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) -
> assignedContainer application attempt=appattempt_1600074574138_66297_000001
> container=null queue=tianqiwang clusterResource=<memory:10240000,
> vCores:4800> type=NODE_LOCAL requestedPartition=
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]