[
https://issues.apache.org/jira/browse/YARN-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516505#comment-16516505
]
Miklos Szegedi commented on YARN-6794:
--------------------------------------
Thank you for the patch [~haibochen]. I have the following comments:
{code:java}
283 Resources.subtractFrom(allocatedResourceGuaranteed, resource);{code}
I think this should refer to allocatedResourceOpportunistic
{code:java}
526 Resource resource = rmContainer.getContainer().getResource();
527 if (Resources.fitsIn(resource, node.getUnallocatedResource())) {
528 node.promoteOpportunisticContainer(rmContainer);{code}
This is outside the critical section. The unallocated space may change after
the check but before the promotion.
{code:java}
529 attemptOpportunisticResourceUsage.decUsed(resource);
530 attemptResourceUsage.incUsed(resource);
531 getQueue().incUsedGuaranteedResource(resource);{code}
SchedulerApplicationAttempt.getResourceUsageReport may return a transient 0
since the write lock is not held at this point.
{code:java}
1391 public Resource getGuaranteedResourceUsage() {{code}
It might be safer to do a clone here and return a read only copy.
{code:java}
162 @Override
163 public void promoteOpportunisticContainer(RMContainer rmContainer) {
164 super.promoteOpportunisticContainer(rmContainer);
165 priorityContainers.remove(rmContainer);
166 }{code}
This might cause a double promotion. The function is not synchronized, so we do
the promotion but priorityContainers can be sampled again with the same
container listed before this finishes.
{code:java}
321 public List<RMContainer> getPriorityContainers() {{code}
It is probably a good idea to have this synchronized as well. The list can
change while you grab the snapshot.
{code:java}
1124 //boolean validReservation =
attemptToAssignReservedResources(node);{code}
?
{code:java}
1165 if (!assigned) {
1166 // break out of the loop because assigned being false
1167 // indicates there is no more resources are available.
1168 break;
1169 }{code}
Is this really true? I could imagine that we try but cannot assign a reserved
resource but we do not even try the opportunistic queue in that case.
> Fair Scheduler to explicitly promote OPPORTUNISITIC containers locally at the
> node where they're running
> --------------------------------------------------------------------------------------------------------
>
> Key: YARN-6794
> URL: https://issues.apache.org/jira/browse/YARN-6794
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, scheduler
> Reporter: Haibo Chen
> Assignee: Haibo Chen
> Priority: Major
> Attachments: YARN-6794-YARN-1011.00.patch,
> YARN-6794-YARN-1011.prelim.patch
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]