zhengchenyu created YARN-5846:
---------------------------------
Summary: Improve the fairscheduler attemptScheduler
Key: YARN-5846
URL: https://issues.apache.org/jira/browse/YARN-5846
Project: Hadoop YARN
Issue Type: Improvement
Components: fairscheduler
Affects Versions: 2.7.1
Environment: CentOS-7.1
Reporter: zhengchenyu
Priority: Minor
Fix For: 2.7.1
when I assign a container, we must consider two factor:
(1) sort the queue and application, and select the proper request.
(2) then we assure this request's host is just this node (data locality).
or skip this loop!
this algorithm regard the sorting queue and application as primary factor. when
yarn consider data locality, for example,
yarn.scheduler.fair.locality.threshold.node=1,
yarn.scheduler.fair.locality.threshold.rack=1 (or
yarn.scheduler.fair.locality-delay-rack-ms and
yarn.scheduler.fair.locality-delay-node-ms is very large) and lots of
applications are runnig, the process of assigning contianer becomes very slow.
I think data locality is more important then the sequence of the queue and
applications.
I wanna a new algorithm like this:
(1) when resourcemanager accept a new request, notice the RMNodeImpl,
and then record this association between RMNode and request
(2) when assign containers for node, we assign container by
RMNodeImpl's association between RMNode and request directly
(3) then I consider the priority of queue and applation. In one object
of RMNodeImpl, we sort the request of association.
(4) and I think the sorting of current algorithm is consuming, in
especial, losts of applications are running, lots of sorting are called. so I
think we should sort the queue and applicaiton in a daemon thread, because less
error of queues's sequences is allowed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]