[
https://issues.apache.org/jira/browse/YARN-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685102#comment-15685102
]
Arun Suresh commented on YARN-5918:
-----------------------------------
Thanks for raising this [~bibinchundatt] and for chiming in [~varun_saxena].
bq. If we fix code as above, we will return less nodes for scheduling
opportunistic containers than
yarn.opportunistic-container-allocation.nodes-used configuration even though
enough nodes are available. But this should be updated the very next second (as
per default config) which maybe fine.
As you pointed out, this is actually fine.
bq. Although we remove node when a node is lost from cluster nodes, we do not
remove it from sorted nodes. Because for doing it we will have to iterate over
the list. Can we keep a set instead ?
We had initially thought of using a SortedSet, but Insertions and deletions
were somewhat expensive and a LinkedList cheaply satisfied our use-case.
Can you maybe add a test to {{TestNodeQueueLoadMonitor}} for this ?
+1 pending.
> Opportunistic scheduling allocate request failure when NM lost
> --------------------------------------------------------------
>
> Key: YARN-5918
> URL: https://issues.apache.org/jira/browse/YARN-5918
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Bibin A Chundatt
> Assignee: Bibin A Chundatt
> Attachments: YARN-5918.0001.patch
>
>
> Allocate request failure during Opportunistic container allocation when
> nodemanager is lost
> {noformat}
> 2016-11-20 10:38:49,011 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS
> APPID=application_1479637990302_0002
> CONTAINERID=container_e12_1479637990302_0002_01_000006
> RESOURCE=<memory:1024, vCores:1>
> 2016-11-20 10:38:49,011 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Removed node docker2:38297 clusterResource: <memory:4096, vCores:8>
> 2016-11-20 10:38:49,434 WARN org.apache.hadoop.ipc.Server: IPC Server handler
> 7 on 8030, call Call#35 Retry#0
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from
> 172.17.0.2:51584
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.convertToRemoteNode(OpportunisticContainerAllocatorAMService.java:420)
> at
> org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.convertToRemoteNodes(OpportunisticContainerAllocatorAMService.java:412)
> at
> org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.getLeastLoadedNodes(OpportunisticContainerAllocatorAMService.java:402)
> at
> org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.allocate(OpportunisticContainerAllocatorAMService.java:236)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:467)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:990)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2539)
> 2016-11-20 10:38:50,824 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_e12_1479637990302_0002_01_000002 Container Transitioned from
> RUNNING to COMPLETED
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]