[
https://issues.apache.org/jira/browse/YARN-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590262#comment-15590262
]
Jian He commented on YARN-5716:
-------------------------------
- remove this?
{code}
/*
FIXME
application
.getCSLeafQueue()
.getOrderingPolicy()
.containerAllocated(application,
application.getRMContainer(updatedContainer.getId()));
*/
{code}
- When stop, if the thread is interrupted, the else condition will make the
thread continue to run. the join in stop will always timeout.
{code}
if (!runSchedules.get() && !Thread.currentThread().isInterrupted()) {
Thread.sleep(100);
} else{
// Don't run schedule if we have some pending backlogs already
if (cs.getAsyncSchedulingPendingBacklogs() > 100) {
Thread.sleep(1);
} else{
schedule(cs);
}
}
} catch (InterruptedException ie) {
// Do nothing
}
{code}
- the withNodeHeartbeat flag is not needed, because it's already checked in the
caller
{code}
private boolean canAllocateMore(CSAssignment assignment,
boolean withNodeHeartbeat) {
{code}
- how about check whether "getNode(node.getNodeID())" equals to null, I feel
that's easier to reason for a removed node
{code}
// Backward compatible way to make sure previous behavior which allocation
// driven by node heartbeat works.
if (getNode(node.getNodeID()) != node)
{code}
- This if condition can be merged into previous "if (reservedContainer != null)
{" condition, as they are the same.
{code}
// Do not schedule if there are any reservations to fulfill on the node
if (node.getReservedContainer() != null) {
if (LOG.isDebugEnabled()) {
LOG.debug("Skipping scheduling since node " + node.getNodeID()
+ " is reserved by application " + node.getReservedContainer()
.getContainerId().getApplicationAttemptId());
}
return null;
}
{code}
- canSchedule flag is not needed, the previuous if condition can just return
direclty
{code}
if (!canSchedule) {
if (LOG.isDebugEnabled()) {
LOG.debug("This node or this node partition doesn't have available or"
+ "killable resource");
}
return null;
}
{code}
- the canSchedule flag here is also not needed
similarly,
{code}
boolean canSchedule = false;
// When this time look at multiple nodes, try schedule if the
// partition has any available resource
if (root.getQueueCapacities().getUsedCapacity(ps.getPartition()) < 1.0f
|| preemptionManager.getKillableResource(
CapacitySchedulerConfiguration.ROOT, ps.getPartition()) != Resources
.none()) {
canSchedule = true;
}
if (!canSchedule) {
if (LOG.isDebugEnabled()) {
LOG.debug("This node or this node partition doesn't have available or"
+ "killable resource");
}
return null;
}
{code}
- Looks like one behavior change is that previously on node heartbeat, we
always satisfy reservedContainer first, now in async scheduling, it's not the
case any more ?
- this if check is not needed, beacuse it's already checked in the caller
{code}
private CSAssignment allocateContainersOnMultiNodes(
PlacementSet<FiCaSchedulerNode> ps) {
if (rmContext.isWorkPreservingRecoveryEnabled() && !rmContext
.isSchedulerReadyForAllocatingContainers()) {
return null;
}
{code}
- PlacementSet, placement is an abstract name, how about NodeSet to be more
concret?
-PlacementSetUtils.getSingleNode -> hasSingleNode
- ContainerAllocationContext -> ContainerAllocationProposal
- nodePartition parameter is not needed, it can be inferred from 'node'
parameter
{code}
public SchedulerContainer(A app, N node, RMContainer rmContainer,
String nodePartition, boolean allocated) {
{code}
- remove this method, a new method with the same name just makes it more
confusing. And the other getSchedulerContainer method with false condition is
already being used in some places.
{code}
private SchedulerContainer<FiCaSchedulerApp, FiCaSchedulerNode>
getSchedulerContainer(
RMContainer rmContainer) {
return getSchedulerContainer(rmContainer, false);
}
{code}
- These two if conditidions code are duplicated, can be moved out side
{code}
if (updateUnconfirmedAllocatedResource) {
app.decUnconfirmedRes(request.getTotalAllocatedResource());
}
LOG.info("Allocation proposal accepted");
} else{
if (updateUnconfirmedAllocatedResource) {
app.decUnconfirmedRes(request.getTotalAllocatedResource());
}
{code}
- remove this
{{// If we do assign, remove the queue and re-insert in-order to re-sort}}\
- In LeafQueue#updateCurrentResourceLimits, multiple threads will update
cachedResourceLimitsForHeadroom without synchronization
- Fix indetation
{code}
if (null != application) {
ActivitiesLogger.APP.startAppAllocationRecording(activitiesManager,
node.getNodeID(), SystemClock.getInstance().getTime(), application);
CSAssignment assignment = application.assignContainers(clusterResource,
ps, currentResourceLimits, schedulingMode, reservedContainer);
return assignment;
}
{code}
- LeafQueeu#handleReservedContainer -> allocateReservedContainer
- SchedulerApplicationAttempt#incNumAllocatedContainers, all the locality
statistics functionality are removed ?
> Add global scheduler interface definition and update CapacityScheduler to use
> it.
> ---------------------------------------------------------------------------------
>
> Key: YARN-5716
> URL: https://issues.apache.org/jira/browse/YARN-5716
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-5716.001.patch, YARN-5716.002.patch,
> YARN-5716.003.patch, YARN-5716.004.patch
>
>
> Target of this JIRA:
> - Definition of interfaces / objects which will be used by global scheduling,
> this will be shared by different schedulers.
> - Modify CapacityScheduler to use it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]