[
https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602522#comment-16602522
]
niu commented on YARN-8513:
---------------------------
Debug dump:
{code:java}
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
shopee-test-cluster04:45454 of type STATUS_UPDATE
2018-09-03 11:44:11,175 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
NODE_UPDATE
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
nodeUpdate: shopee-test-cluster04:45454 cluster capacity: <memory:1351680,
vCores:240>
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
Node being looked for scheduling shopee-test-cluster04:45454 availableResource:
<memory:82944, vCores:77>
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Trying to schedule on node: shopee-test-cluster04, available: <memory:82944,
vCores:77>
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Trying to assign containers to child-queue of root
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
Check assign to queue: root nodePartition: , usedResources: <memory:1095680,
vCores:8>, clusterResources: <memory:1351680, vCores:240>, currentUsedCapacity:
0.81060606, max-capacity: 1.0
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
printChildQueues - queue: root child-queues: root.dwusedCapacity=(1.1842697),
label=(*)root.devusedCapacity=(0.016571993), label=(*)
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Trying to assign to queue: root.dev stats: dev: capacity=0.32,
absoluteCapacity=0.32, usedResources=<memory:7168, vCores:1>,
usedCapacity=0.016571993, absoluteUsedCapacity=0.0053030304, numApps=1,
numContainers=1
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
assignContainers: partition= #applications=1
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
Check assign to queue: dev nodePartition: , usedResources: <memory:7168,
vCores:1>, clusterResources: <memory:1351680, vCores:240>, currentUsedCapacity:
0.0053030304, max-capacity: 0.6
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
userLimit is fetched. userLimit=<memory:432640, vCores:77>,
userSpecificUserLimit=<memory:432640, vCores:77>,
schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Headroom calculation for user work: userLimit=<memory:432640, vCores:77>
queueMaxAvailRes=<memory:811008, vCores:144> consumed=<memory:7168, vCores:1>
partition=
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
pre-assignContainers for application application_1535930391687_0019
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
userLimit is fetched. userLimit=<memory:432640, vCores:77>,
userSpecificUserLimit=<memory:432640, vCores:77>,
schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
showRequests: application=application_1535930391687_0019
headRoom=<memory:425472, vCores:76> currentConsumption=7168
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalitySchedulingPlacementSet:
Request={AllocationRequestId: 0, Priority: 1, Capability:
<memory:360448, vCores:2>, # Containers: 3, Location: *, Relax Locality: true,
Execution Type Request: null, Node Label Expression: }
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
assignContainers: node=shopee-test-cluster04
application=application_1535930391687_0019 priority=1
pendingAsk=<per-allocation-resource=<memory:360448, vCores:2>,repeat=3>
type=OFF_SWITCH
2018-09-03 11:44:11,175 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
Reserved container application=application_1535930391687_0019
resource=<memory:360448, vCores:2>
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@65ed660
cluster=<memory:1351680, vCores:240>
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
post-assignContainers for application application_1535930391687_0019
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
userLimit is fetched. userLimit=<memory:432640, vCores:77>,
userSpecificUserLimit=<memory:432640, vCores:77>,
schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
showRequests: application=application_1535930391687_0019
headRoom=<memory:425472, vCores:76> currentConsumption=7168
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalitySchedulingPlacementSet:
Request={AllocationRequestId: 0, Priority: 1, Capability:
<memory:360448, vCores:2>, # Containers: 3, Location: *, Relax Locality: true,
Execution Type Request: null, Node Label Expression: }
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Assigned to queue: root.dev stats: dev: capacity=0.32, absoluteCapacity=0.32,
usedResources=<memory:7168, vCores:1>, usedCapacity=0.016571993,
absoluteUsedCapacity=0.0053030304, numApps=1, numContainers=1 -->
<memory:360448, vCores:2>, OFF_SWITCH
2018-09-03 11:44:11,175 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
assignedContainer queue=root usedCapacity=0.81060606
absoluteUsedCapacity=0.81060606 used=<memory:1095680, vCores:8>
cluster=<memory:1351680, vCores:240>
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
ParentQ=root assignedSoFarInThisIteration=<memory:360448, vCores:2>
usedCapacity=0.81060606 absoluteUsedCapacity=0.81060606
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Try to commit allocation proposal=New
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.ResourceCommitRequest:
RESERVED=[(Application=appattempt_1535930391687_0019_000001;
Node=shopee-test-cluster04:45454; Resource=<memory:360448, vCores:2>)]
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
userLimit is fetched. userLimit=<memory:432640, vCores:77>,
userSpecificUserLimit=<memory:432640, vCores:77>,
schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Headroom calculation for user work: userLimit=<memory:432640, vCores:77>
queueMaxAvailRes=<memory:811008, vCores:144> consumed=<memory:7168, vCores:1>
partition=
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
Used resource=<memory:1095680, vCores:8> exceeded maxResourceLimit of the
queue =<memory:1351680, vCores:240>
2018-09-03 11:44:11,175 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Failed to accept allocation proposal
2018-09-03 11:44:11,175 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Assigned maximum number of off-switch containers: 1, assignments so far:
resource:<memory:360448, vCores:2>; type:OFF_SWITCH; excessReservation:null;
applicationid:null; skipped:NONE; fulfilled reservation:false;
allocations(count/resource):0/<memory:0, vCores:0>;
reservations(count/resource):1/<memory:360448, vCores:2>
2018-09-03 11:44:11,287 DEBUG org.apache.hadoop.ipc.Server: got #68890
2018-09-03 11:44:11,287 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
30 on 8031: Call#68890 Retry#0
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
10.65.205.151:60900 for RpcKind RPC_PROTOCOL_BUFFER
2018-09-03 11:44:11,287 DEBUG org.apache.hadoop.security.UserGroupInformation:
PrivilegedAction as:work (auth:SIMPLE)
from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.ipc.Server: Served:
nodeHeartbeat, queueTime= 1 procesingTime= 0
2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
STATUS_UPDATE
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
shopee-test-cluster03:45454 of type STATUS_UPDATE
2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
30 on 8031: responding to Call#68890 Retry#0
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
10.65.205.151:60900
2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
NODE_UPDATE
2018-09-03 11:44:11,288 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
30 on 8031: responding to Call#68890 Retry#0
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
10.65.205.151:60900 Wrote 42 bytes.
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
nodeUpdate: shopee-test-cluster03:45454 cluster capacity: <memory:1351680,
vCores:240>
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
Node being looked for scheduling shopee-test-cluster03:45454 availableResource:
<memory:90112, vCores:78>
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Trying to schedule on node: shopee-test-cluster03, available: <memory:90112,
vCores:78>
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Trying to assign containers to child-queue of root
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
Check assign to queue: root nodePartition: , usedResources: <memory:1095680,
vCores:8>, clusterResources: <memory:1351680, vCores:240>, currentUsedCapacity:
0.81060606, max-capacity: 1.0
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
printChildQueues - queue: root child-queues: root.dwusedCapacity=(1.1842697),
label=(*)root.devusedCapacity=(0.016571993), label=(*)
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Trying to assign to queue: root.dev stats: dev: capacity=0.32,
absoluteCapacity=0.32, usedResources=<memory:7168, vCores:1>,
usedCapacity=0.016571993, absoluteUsedCapacity=0.0053030304, numApps=1,
numContainers=1
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
assignContainers: partition= #applications=1
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
Check assign to queue: dev nodePartition: , usedResources: <memory:7168,
vCores:1>, clusterResources: <memory:1351680, vCores:240>, currentUsedCapacity:
0.0053030304, max-capacity: 0.6
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
userLimit is fetched. userLimit=<memory:432640, vCores:77>,
userSpecificUserLimit=<memory:432640, vCores:77>,
schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Headroom calculation for user work: userLimit=<memory:432640, vCores:77>
queueMaxAvailRes=<memory:811008, vCores:144> consumed=<memory:7168, vCores:1>
partition=
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
pre-assignContainers for application application_1535930391687_0019
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
userLimit is fetched. userLimit=<memory:432640, vCores:77>,
userSpecificUserLimit=<memory:432640, vCores:77>,
schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
showRequests: application=application_1535930391687_0019
headRoom=<memory:425472, vCores:76> currentConsumption=7168
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalitySchedulingPlacementSet:
Request={AllocationRequestId: 0, Priority: 1, Capability:
<memory:360448, vCores:2>, # Containers: 3, Location: *, Relax Locality: true,
Execution Type Request: null, Node Label Expression: }
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator:
assignContainers: node=shopee-test-cluster03
application=application_1535930391687_0019 priority=1
pendingAsk=<per-allocation-resource=<memory:360448, vCores:2>,repeat=3>
type=OFF_SWITCH
2018-09-03 11:44:11,288 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
Reserved container application=application_1535930391687_0019
resource=<memory:360448, vCores:2>
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@65ed660
cluster=<memory:1351680, vCores:240>
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
post-assignContainers for application application_1535930391687_0019
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
userLimit is fetched. userLimit=<memory:432640, vCores:77>,
userSpecificUserLimit=<memory:432640, vCores:77>,
schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
showRequests: application=application_1535930391687_0019
headRoom=<memory:425472, vCores:76> currentConsumption=7168
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalitySchedulingPlacementSet:
Request={AllocationRequestId: 0, Priority: 1, Capability:
<memory:360448, vCores:2>, # Containers: 3, Location: *, Relax Locality: true,
Execution Type Request: null, Node Label Expression: }
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Assigned to queue: root.dev stats: dev: capacity=0.32, absoluteCapacity=0.32,
usedResources=<memory:7168, vCores:1>, usedCapacity=0.016571993,
absoluteUsedCapacity=0.0053030304, numApps=1, numContainers=1 -->
<memory:360448, vCores:2>, OFF_SWITCH
2018-09-03 11:44:11,288 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
assignedContainer queue=root usedCapacity=0.81060606
absoluteUsedCapacity=0.81060606 used=<memory:1095680, vCores:8>
cluster=<memory:1351680, vCores:240>
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
ParentQ=root assignedSoFarInThisIteration=<memory:360448, vCores:2>
usedCapacity=0.81060606 absoluteUsedCapacity=0.81060606
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Try to commit allocation proposal=New
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.ResourceCommitRequest:
RESERVED=[(Application=appattempt_1535930391687_0019_000001;
Node=shopee-test-cluster03:45454; Resource=<memory:360448, vCores:2>)]
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
userLimit is fetched. userLimit=<memory:432640, vCores:77>,
userSpecificUserLimit=<memory:432640, vCores:77>,
schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Headroom calculation for user work: userLimit=<memory:432640, vCores:77>
queueMaxAvailRes=<memory:811008, vCores:144> consumed=<memory:7168, vCores:1>
partition=
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
Used resource=<memory:1095680, vCores:8> exceeded maxResourceLimit of the
queue =<memory:1351680, vCores:240>
2018-09-03 11:44:11,288 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Failed to accept allocation proposal
2018-09-03 11:44:11,288 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Assigned maximum number of off-switch containers: 1, assignments so far:
resource:<memory:360448, vCores:2>; type:OFF_SWITCH; excessReservation:null;
applicationid:null; skipped:NONE; fulfilled reservation:false;
allocations(count/resource):0/<memory:0, vCores:0>;
reservations(count/resource):1/<memory:360448, vCores:2>
2018-09-03 11:44:11,700 DEBUG org.apache.hadoop.ipc.Server: got #440
2018-09-03 11:44:11,700 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
2 on 8032: Call#440 Retry#0
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
from 10.65.205.148:48970 for RpcKind RPC_PROTOCOL_BUFFER
2018-09-03 11:44:11,700 DEBUG org.apache.hadoop.security.UserGroupInformation:
PrivilegedAction as:work (auth:SIMPLE)
from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
2018-09-03 11:44:11,700 DEBUG
org.apache.hadoop.yarn.server.security.ApplicationACLsManager: Verifying
access-type VIEW_APP for work (auth:SIMPLE) on application
application_1535930391687_0019 owned by work
2018-09-03 11:44:11,701 DEBUG org.apache.hadoop.ipc.Server: Served:
getApplicationReport, queueTime= 0 procesingTime= 1
2018-09-03 11:44:11,701 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
2 on 8032: responding to Call#440 Retry#0
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
from 10.65.205.148:48970
2018-09-03 11:44:11,701 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
2 on 8032: responding to Call#440 Retry#0
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
from 10.65.205.148:48970 Wrote 358 bytes.
2018-09-03 11:44:11,989 DEBUG org.apache.hadoop.ipc.Server: got #3118
2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
27 on 8032: Call#3118 Retry#0
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
from 10.65.205.148:48370 for RpcKind RPC_PROTOCOL_BUFFER
2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.security.UserGroupInformation:
PrivilegedAction as:work (auth:SIMPLE)
from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
2018-09-03 11:44:11,990 DEBUG
org.apache.hadoop.yarn.server.security.ApplicationACLsManager: Verifying
access-type VIEW_APP for work (auth:SIMPLE) on application
application_1535930391687_0012 owned by work
2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.ipc.Server: Served:
getApplicationReport, queueTime= 1 procesingTime= 0
2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
27 on 8032: responding to Call#3118 Retry#0
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
from 10.65.205.148:48370
2018-09-03 11:44:11,990 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
27 on 8032: responding to Call#3118 Retry#0
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport
from 10.65.205.148:48370 Wrote 361 bytes.
2018-09-03 11:44:12,005 DEBUG org.apache.hadoop.ipc.Server: got #502725
2018-09-03 11:44:12,005 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
31 on 8031: Call#502725 Retry#0
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
10.65.205.150:38836 for RpcKind RPC_PROTOCOL_BUFFER
2018-09-03 11:44:12,005 DEBUG org.apache.hadoop.security.UserGroupInformation:
PrivilegedAction as:work (auth:SIMPLE)
from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.ipc.Server: Served:
nodeHeartbeat, queueTime= 1 procesingTime= 0
2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
31 on 8031: responding to Call#502725 Retry#0
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
10.65.205.150:38836
2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
STATUS_UPDATE
2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler
31 on 8031: responding to Call#502725 Retry#0
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
10.65.205.150:38836 Wrote 42 bytes.
2018-09-03 11:44:12,006 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
shopee-test-cluster02:45454 of type STATUS_UPDATE
2018-09-03 11:44:12,006 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
NODE_UPDATE
2018-09-03 11:44:12,006 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
nodeUpdate: shopee-test-cluster02:45454 cluster capacity: <memory:1351680,
vCores:240>
2018-09-03 11:44:12,006 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
Node being looked for scheduling shopee-test-cluster02:45454 availableResource:
<memory:82944, vCores:77>
2018-09-03 11:44:12,006 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Trying to schedule on node: shopee-test-cluster02, available: <memory:82944,
vCores:77>{code}
> CapacityScheduler infinite loop when queue is near fully utilized
> -----------------------------------------------------------------
>
> Key: YARN-8513
> URL: https://issues.apache.org/jira/browse/YARN-8513
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, yarn
> Affects Versions: 3.1.0, 2.9.1
> Environment: Ubuntu 14.04.5 and 16.04.4
> YARN is configured with one label and 5 queues.
> Reporter: Chen Yufei
> Priority: Major
> Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log,
> jstack-5.log, top-during-lock.log, top-when-normal.log, yarn3-jstack1.log,
> yarn3-jstack2.log, yarn3-jstack3.log, yarn3-jstack4.log, yarn3-jstack5.log,
> yarn3-resourcemanager.log, yarn3-top
>
>
> ResourceManager does not respond to any request when queue is near fully
> utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM
> restart, it can recover running jobs and start accepting new ones.
>
> Seems like CapacityScheduler is in an infinite loop printing out the
> following log messages (more than 25,000 lines in a second):
>
> {{2018-07-10 17:16:29,227 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> assignedContainer queue=root usedCapacity=0.99816763
> absoluteUsedCapacity=0.99816763 used=<memory:16170624, vCores:1577>
> cluster=<memory:29441544, vCores:5792>}}
> {{2018-07-10 17:16:29,227 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Failed to accept allocation proposal}}
> {{2018-07-10 17:16:29,227 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
> assignedContainer application attempt=appattempt_1530619767030_1652_000001
> container=null
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943
> clusterResource=<memory:29441544, vCores:5792> type=NODE_LOCAL
> requestedPartition=}}
>
> I encounter this problem several times after upgrading to YARN 2.9.1, while
> the same configuration works fine under version 2.7.3.
>
> YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a
> similar problem.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]