[
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267779#comment-15267779
]
Konstantinos Karanasos commented on YARN-2888:
----------------------------------------------
Thanks for the patch, [~asuresh]. Please find some comments below.
Once we fix these, I will give it an extra look in case I see something I
didn't notice with this first pass.
In {{YarnConfiguration}}:
* I would use everywhere "max-queue-length", rather than "queue-limit". It is
more informative, and we might eventually have "max-queue-wait-time", so it
will be easier to differentiate.
* MEAN_SIGMA -> MEAN_STDEV
* As above, DIST_SCHEDULING_QUEUE_LIMIT_MIN ->
DIST_SCHEDULING_MIN_QUEUE_LENGTH. Similar for MAX.
* Do we want to make the min and max queue lengths specific to distributed
scheduling? Maybe we can keep them general, in which case we could rename the
parameters to something like nm-queuing.max-queue-length.
Rename ContainerQueuingLimit* to NMQueuingLimit*?
In {{Context.java}}:
* Remove line break from import.
* Why is it needed to change the return type of getContainerManager() to
ContainerManager (instead of ContainerManagementProtocol)? Same goes for the
{{NodeManager}}.
In {{NodeStatusUpdaterImpl}}:
* There seem to be changes in the copyright, which are not needed (due to
formatting).
* Remove line breaks from imports.
* There is reformatting in various places regarding code that you are not
touching in this patch. We might want to revert those changes, because they
make hard to follow the actual changes in the patch.
* Line 863, QueuingLimits -> QueuingLimit, queueing -> queuing.
In {{ContainerManager}}:
* updateQueuingLimits -> updateQueuingLimit
In {{QueuingContainerManagerImpl}}:
* Maybe move the setMaxQueueLength(-1) and the setMaxWaitTime(-1) inside the
newInstance() call?
* I don't think you need a synchronized in the updateQueuingLimits.
* In the updateQueuingLimits, probably we want to update the queue wait time
too. Would it be better to set directly the queuingLimit instead of setting
each parameter?
* Line 499, is maxQueueLength ever -1? If dist scheduling is not enabled, we do
not update the limits. Also, I think we should call
pruneOpportunisticContainers() only if queue length is greater than 0.
* Maybe pruneOpportunisticContainerQueue() ->
pruneQueuedOpportunisticContainers() or shedQueuedOpportunisticContainers()?
* In pruneOpportunisticContainerQueue(), let's use more descriptive variable
names than counter and iterator.
* In pruneOpportunisticContainerQueue(), let's use the same logic/code as in
the stopContainerInternal().
In {{DistributedSchedulingService}}:
* Remove line break from import.
In {{QueueLimitCalculator}}:
* Remove line breaks from imports.
* I think we can get rid of the median_sigma. Having mean_sigma should be
sufficient. Moreover, standard deviation should not depend on whether we are
using mean or median (but this will not be a problem if we remove the median).
* The calculation of the mean and stdev should be done over all nodes and not
just the top k.
> Corrective mechanisms for rebalancing NM container queues
> ---------------------------------------------------------
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, resourcemanager
> Reporter: Konstantinos Karanasos
> Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch,
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of
> the scheduling decisions or due to having a stale image of the system) may
> lead to an imbalance in the waiting times of the NM container queues. This
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever
> needed) container requests from overloaded queues, adding them to less-loaded
> ones.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]