[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wangda Tan updated YARN-1197:
-----------------------------
Attachment: yarn-server-resourcemanager.patch.ver.1
yarn-server-nodemanager.patch.ver.1
yarn-server-common.patch.ver.1
yarn-pb-impl.patch.ver.1
yarn-api-protocol.patch.ver.1
tools-project.patch.ver.1
mapreduce-project.patch.ver.1
I just finished container resource increase support including PB/API changes,
make capacity scheduler support increasing and NM can support change monitoring
size of a running container.
*I splitted it to several patches for easier review,*
* API/pb file changes in hadoop-yarn-api
* PB implementations in hadoop-yarn-common
* yarn-server-common changes
* yarn-server-resourcemanager changes include capacity scheduler and AMS
master, etc. changes
* yarn-server-nodemanager changes include ContainerManagerImpl and
ContainersMonitor changes
* other related project changes according to updated APIs (map-reduce/tools)
Aboves a preview patches, still very rough, [~bikassaha], [~sandyr], [~tucu00]
, [~vinodkv] could you please do some review on them, I'm eager for your ideas!
*And some short notes for current implementations on RM/NM not covered in
design doc,*
*1) Implementation in capacity scheduler for increasing a container size*
It's very close to allocate a new container, some details,
* Increase request can be only valid when asked size larger than existed
resource, and container state is either RUNNING or ACQUIRED
* The entry point of increase request allocation is still in
CapacityScheduler:nodeUpdate()
* When increase request cannot be allocated, it will also be reserved. Each
node can only reserve at most one request (increase request or new container
request). I created a new method isReserved() in FiCaSchedulerNode to make
scheduler/queue identify if a node is reserved
* The major logic for increase request allocation is also placed in
LeafQueue:assignContainers, increase requests will be proceeded before new
container request.
* Queue(leaf/parent) capacity and user capacity checking will also be done
before reserve or allocate a increase request
* Queue(leaf/parent) used resource will also be deduct when increase request
reserved
* Users may submit increase request several times on a same container with
different size.
** If asked size is equal to previous asked size, it will be ignored
** If asked size is smaller or equal to existed size, this will cancel increase
request on this container
** If asked size is different of previous asked size, and greater than existing
size, it will replace previous ask and cancel previous reservations (if
existed).
*2) Implementation in node manager for increasing a container size*
* It will do a similar check logic (like token verifications, etc.) like start
container
* Increase logic will only valid when ContainerState(The internal
ContainerState:
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerState)
is RUNNING, to avoid a racing condition.
* ContainersMonitorImpl will put change requests to containersToBeChanged when
received CHANGE_MONITORING_CONTAINER event. And it will be proceeded in
MonitoringThread:run()
> Support changing resources of an allocated container
> ----------------------------------------------------
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
> Issue Type: Task
> Components: api, nodemanager, resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: mapreduce-project.patch.ver.1,
> tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf,
> yarn-1197-v4.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1,
> yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1,
> yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1
>
>
> Currently, YARN cannot support merge several containers in one node to a big
> container, which can make us incrementally ask resources, merge them to a
> bigger one, and launch our processes. The user scenario is described in the
> comments.
--
This message was sent by Atlassian JIRA
(v6.1#6144)