[jira] [Updated] (MAPREDUCE-6258) add support to back up JHS files from application master

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6258:

Labels: BB2015-05-TBR  (was: )

 add support to back up JHS files from application master
 

 Key: MAPREDUCE-6258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster
Affects Versions: 2.4.1
Reporter: Jian Fang
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6258.patch


 In hadoop two, job history files are stored on HDFS with a default retention 
 period of one week. In a cloud environment, these HDFS files are actually 
 stored on the disks of ephemeral instances that could go away once the 
 instances are terminated. Users may want to back up the job history files for 
 issue investigation and performance analysis before and after the cluster is 
 terminated. 
 A centralized backup mechanism could have a scalability issue for big and 
 busy Hadoop clusters where there are probably tens of thousands of jobs every 
 day. As a result, it is preferred to have a distributed way to back up the 
 job history files in this case. To achieve this goal, we could add a new 
 feature to back up the job history files in Application master. More 
 specifically, we could copy the job history files to a backup path when they 
 are moved from the temporary staging directory to the intermediate_done path 
 in application master. Since application masters could run on any slave nodes 
 on a Hadoop cluster, we could achieve a better scalability by backing up the 
 job history files in a distributed fashion.
 Please be aware, the backup path should be managed by the Hadoop users based 
 on their needs. For example, some Hadoop users may copy the job history files 
 to a cloud storage directly and keep them there forever. While some other 
 users may want to store the job history files on local disks and clean them 
 up from time to time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6258) add support to back up JHS files from application master

2015-02-12 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6258:
-
Attachment: MAPREDUCE-6258.patch

 add support to back up JHS files from application master
 

 Key: MAPREDUCE-6258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster
Affects Versions: 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6258.patch


 In hadoop two, job history files are stored on HDFS with a default retention 
 period of one week. In a cloud environment, these HDFS files are actually 
 stored on the disks of ephemeral instances that could go away once the 
 instances are terminated. Users may want to back up the job history files for 
 issue investigation and performance analysis before and after the cluster is 
 terminated. 
 A centralized backup mechanism could have a scalability issue for big and 
 busy Hadoop clusters where there are probably tens of thousands of jobs every 
 day. As a result, it is preferred to have a distributed way to back up the 
 job history files in this case. To achieve this goal, we could add a new 
 feature to back up the job history files in Application master. More 
 specifically, we could copy the job history files to a backup path when they 
 are moved from the temporary staging directory to the intermediate_done path 
 in application master. Since application masters could run on any slave nodes 
 on a Hadoop cluster, we could achieve a better scalability by backing up the 
 job history files in a distributed fashion.
 Please be aware, the backup path should be managed by the Hadoop users based 
 on their needs. For example, some Hadoop users may copy the job history files 
 to a cloud storage directly and keep them there forever. While some other 
 users may want to store the job history files on local disks and clean them 
 up from time to time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6258) add support to back up JHS files from application master

2015-02-12 Thread Jian Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Fang updated MAPREDUCE-6258:
-
Status: Patch Available  (was: Open)

 add support to back up JHS files from application master
 

 Key: MAPREDUCE-6258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: applicationmaster
Affects Versions: 2.4.1
Reporter: Jian Fang
 Attachments: MAPREDUCE-6258.patch


 In hadoop two, job history files are stored on HDFS with a default retention 
 period of one week. In a cloud environment, these HDFS files are actually 
 stored on the disks of ephemeral instances that could go away once the 
 instances are terminated. Users may want to back up the job history files for 
 issue investigation and performance analysis before and after the cluster is 
 terminated. 
 A centralized backup mechanism could have a scalability issue for big and 
 busy Hadoop clusters where there are probably tens of thousands of jobs every 
 day. As a result, it is preferred to have a distributed way to back up the 
 job history files in this case. To achieve this goal, we could add a new 
 feature to back up the job history files in Application master. More 
 specifically, we could copy the job history files to a backup path when they 
 are moved from the temporary staging directory to the intermediate_done path 
 in application master. Since application masters could run on any slave nodes 
 on a Hadoop cluster, we could achieve a better scalability by backing up the 
 job history files in a distributed fashion.
 Please be aware, the backup path should be managed by the Hadoop users based 
 on their needs. For example, some Hadoop users may copy the job history files 
 to a cloud storage directly and keep them there forever. While some other 
 users may want to store the job history files on local disks and clean them 
 up from time to time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)