[
https://issues.apache.org/jira/browse/YARN-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593292#comment-15593292
]
Wangda Tan commented on YARN-5716:
----------------------------------
Thanks [~jianhe] for the thorough review!
bq. how about check whether "getNode(node.getNodeID())" equals to null, I feel
that's easier to reason for a removed node
It is also possible that ref to the node is changed, for example, node resource
updated. In that case, we may need to skip such node for safety.
bq. This if condition can be merged into previous "if (reservedContainer !=
null) {" condition, as they are the same.
No we cannot do this merge, because it is possible that in the previous
reservedContainer != null if, we reserve a new container so the check is not
valid.
bq. Looks like one behavior change is that previously on node heartbeat, we
always satisfy reservedContainer first, now in async scheduling, it's not the
case any more ?
It is still the same, if you look at {{allocateContainerOnSingleNode}}, we try
to satisify reserved contaienr first.
bq. PlacementSet, placement is an abstract name, how about NodeSet to be more
concret?
I would prefer use "PlacementSet", since it is for "placement", we could add
more information to it, for example, racks.
bq. PlacementSetUtils.getSingleNode -> hasSingleNode
But what we need to do is to return the node if it is a
single-node-placement-set, I think this name is better
bq. nodePartition parameter is not needed, it can be inferred from 'node'
parameter
The original purpose adding the partition is, the partition of node could be
updated in between of proposal proposed and applied, it will be used to check
if we should reject the proposal when partition of the node changed. I have a
separate "TODO" in FiCaSchedulerApp:
{code}
// TODO, make sure all node labels are not changed
{code}
bq. In LeafQueue#updateCurrentResourceLimits, multiple threads will update
cachedResourceLimitsForHeadroom without synchronization
This is intentional: we want the resourceLimitsForHeadroom is up-to-date, it is
possible that one thread has some inconsistency data, but it will be corrected
soon by other threads. Since the resourceLimitsForHeadroom is only used to just
give more hints to application, it should be fine.
And ResourceLimits is volatile, so it is safe as well.
bq. SchedulerApplicationAttempt#incNumAllocatedContainers, all the locality
statistics functionality are removed ?
Oh I missed that, will update it in the next iteration.
Addressed all the other comments
Uploaded patch ver.5
> Add global scheduler interface definition and update CapacityScheduler to use
> it.
> ---------------------------------------------------------------------------------
>
> Key: YARN-5716
> URL: https://issues.apache.org/jira/browse/YARN-5716
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-5716.001.patch, YARN-5716.002.patch,
> YARN-5716.003.patch, YARN-5716.004.patch, YARN-5716.005.patch
>
>
> Target of this JIRA:
> - Definition of interfaces / objects which will be used by global scheduling,
> this will be shared by different schedulers.
> - Modify CapacityScheduler to use it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]