[
https://issues.apache.org/jira/browse/YARN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Chen updated YARN-2172:
-------------------------------
Attachment: Hadoop Job Suspend Resume Design.docx
Design Document for Hadoop Job Suspend/Resume Implementation
> Suspend/Resume Hadoop Jobs
> --------------------------
>
> Key: YARN-2172
> URL: https://issues.apache.org/jira/browse/YARN-2172
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: resourcemanager, webapp
> Affects Versions: 2.2.0
> Environment: CentOS 6.5, Hadoop 2.2.0
> Reporter: Richard Chen
> Labels: hadoop, jobs, resume, suspend
> Fix For: 2.2.0
>
> Attachments: Hadoop Job Suspend Resume Design.docx
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> In a multi-application cluster environment, jobs running inside Hadoop YARN
> may be of lower-priority than jobs running outside Hadoop YARN like HBase. To
> give way to other higher-priority jobs inside Hadoop, a user or some
> cluster-level resource scheduling service should be able to suspend and/or
> resume some particular jobs within Hadoop YARN.
> When target jobs inside Hadoop are suspended, those already allocated and
> running task containers will continue to run until their completion or active
> preemption by other ways. But no more new containers would be allocated to
> the target jobs. In contrast, when suspended jobs are put into resume mode,
> they will continue to run from the previous job progress and have new task
> containers allocated to complete the rest of the jobs.
> My team has completed its implementation and our tests showed it works in a
> rather solid way.
--
This message was sent by Atlassian JIRA
(v6.2#6252)