[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496605#comment-14496605 ] Jian Fang commented on MAPREDUCE-6304: -- I created JIRA YARN-3490 for the application decorator proposal. Specifying node labels when submitting MR jobs -- Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: MAPREDUCE-6304.20150410-1.patch, MAPREDUCE-6304.20150411-1.patch Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492818#comment-14492818 ] Jian Fang commented on MAPREDUCE-6304: -- I mean hadoop could provide a new mechanism such as a decorator for the ApplicationSubmissionContext. When the method submitApplication() in class ClientRMService is called, hadoop decorates the ApplicationSubmissionContext before it calls the following line. For example, manipulates the amLabelExpression. rmAppManager.submitApplication(submissionContext, System.currentTimeMillis(), user); Hadoop could provide a default decorator that does nothing. But users could override the default decorator in yarn-site.xml by a new configuration parameter, for example, yarn.app.submission.context.decorator.class. This new mechanism is not directly related to the change you are making, but it is more generic so that the platform providers could update the ApplicationSubmissionContext in their own ways. Once we have such a new mechanism in place, you do not really need to add anything new to your label code for my use case. Instead, the custom logic will be included in the custom decorator provided by the platform provider. For example, we could provide a decorator to update amLabelExpression in ApplicationSubmissionContext. Other fields of ApplicationSubmissionContext could be changed as well to meet user's needs. Specifying node labels when submitting MR jobs -- Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: MAPREDUCE-6304.20150410-1.patch, MAPREDUCE-6304.20150411-1.patch Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492608#comment-14492608 ] Jian Fang commented on MAPREDUCE-6304: -- A more generic way could be to add a decorator to ApplicationSubmissionContext in class ClientRMService so that people can change the ApplicationSubmissionContext in the method submitApplication(). The default decorator from Apache does nothing, but hadoop allows users to use a custom decorator from hadoop configuration. Specifying node labels when submitting MR jobs -- Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: MAPREDUCE-6304.20150410-1.patch, MAPREDUCE-6304.20150411-1.patch Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485817#comment-14485817 ] Jian Fang commented on MAPREDUCE-6304: -- YarnRunner is the right place to set the labels for a MR job. However, there is one concern here. If I understand correctly, YarnRunner runs at the job client side, right? There should also be a way to hook in the labels on the server side, i.e., resource manager side. The reason is that many Hadoop users do not understand or set the labels by themselves and they simply rely on the Hadoop platform provider (or system admins for on-premise clusters) to set up the labels for them. I am not sure if this is a general use case, but it is definitely a feature that we need. Specifying node labels when submitting MR jobs -- Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Assignee: Naganarasimha G R Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485532#comment-14485532 ] Jian Fang commented on MAPREDUCE-6304: -- mapreduce.job.label may not be enough. There should be at least another parameter such as mapreduce.job.am.label for the application master. For example, on EC2, we don't want to run an application master on a spot instance, but we do allow MR tasks to run on spot instances (otherwise, what is the purpose to use instances?). Furthermore, Application Master is a special Yarn container and MRAppMaster does not run as a YarnChild, right? Specifying node labels when submitting MR jobs -- Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Assignee: Naganarasimha G R Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485884#comment-14485884 ] Jian Fang commented on MAPREDUCE-6304: -- I understand your point from on-premise cluster perspective. However, it is not very practical to manage mapred-site.xml or queue files for users if hadoop is a service in cloud. As a hadoop developer, you should consider both on-premise hadoop cluster and hadoop in cloud. There are many many users for a hadoop cloud service. Usually they launch their own hadoop clusters in cloud and control their own queue files or mapred-site.xml. Some of them even run their hadoop jobs on their own gateways that the hadoop platform provider does not have access to. But the hadoop service provider may still want to have a mechanism to set up some global labels for all users to improve their user experiences. For example, a failure of an application master on a spot instance due to the termination of a spot instance will cause more trouble than a failure of one MR task. These types of settings most likely can only be done by hadoop cloud service providers based on their deep knowledge in their own cloud services. Or could hadoop provide a mechanism for hadoop providers to extend so that you only need to specify the labels in YarnRunner in Vanilla hadoop? Specifying node labels when submitting MR jobs -- Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Assignee: Naganarasimha G R Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485918#comment-14485918 ] Jian Fang commented on MAPREDUCE-6304: -- Thanks Naganarasimha for your understanding. However, user/group mapping may not work for us since we don't have control of that as a hadoop service provider. I would prefer a plugin mechanism rather than a solution here so that we can extend that for our service. But I think the change for YarnRunner is still needed for hadoop users anyway. Specifying node labels when submitting MR jobs -- Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Assignee: Naganarasimha G R Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391724#comment-14391724 ] Jian Fang commented on MAPREDUCE-6304: -- Link related JIRAs Specifying node labels when submitting MR jobs -- Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
Jian Fang created MAPREDUCE-6304: Summary: Specifying node labels when submitting MR jobs Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Per the discussion on Yarn-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6258) add support to back up JHS files from application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320647#comment-14320647 ] Jian Fang commented on MAPREDUCE-6258: -- It is not uncommon, all users run hadoop clusters in cloud should face the same issue. For example, we have to write a specific progress to dump out the JHS files to local disks continuously and then upload to multiple places such as s3. As we observed, the single process did not scale well for a big and busy cluster and the overhead to synchronize the local JHS files and the files on HDFS is nontrivial. Furthermore, we need to have the JHS files available once a Job is finished that rules out distcp. As far as I understand, the current JHS files are stored on HDFS only by looking at its internal implementation. I think the reason is that they have to be remotely accessible if the job history server runs in another node after the JHS server is separated out from the job tracker in Hadoop one and the job tracker is split into multiple distributed components in hadoop two. You cannot really just dump the JHS files to somewhere. The somewhere must be reliable and accessible by the JHS server. As a result, I think this feature is the easy way to achieve our goal. Furthermore, this feature is off by default, users turn it on only when they need it. add support to back up JHS files from application master Key: MAPREDUCE-6258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster Affects Versions: 2.4.1 Reporter: Jian Fang Attachments: MAPREDUCE-6258.patch In hadoop two, job history files are stored on HDFS with a default retention period of one week. In a cloud environment, these HDFS files are actually stored on the disks of ephemeral instances that could go away once the instances are terminated. Users may want to back up the job history files for issue investigation and performance analysis before and after the cluster is terminated. A centralized backup mechanism could have a scalability issue for big and busy Hadoop clusters where there are probably tens of thousands of jobs every day. As a result, it is preferred to have a distributed way to back up the job history files in this case. To achieve this goal, we could add a new feature to back up the job history files in Application master. More specifically, we could copy the job history files to a backup path when they are moved from the temporary staging directory to the intermediate_done path in application master. Since application masters could run on any slave nodes on a Hadoop cluster, we could achieve a better scalability by backing up the job history files in a distributed fashion. Please be aware, the backup path should be managed by the Hadoop users based on their needs. For example, some Hadoop users may copy the job history files to a cloud storage directly and keep them there forever. While some other users may want to store the job history files on local disks and clean them up from time to time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6258) add support to back up JHS files from application master
Jian Fang created MAPREDUCE-6258: Summary: add support to back up JHS files from application master Key: MAPREDUCE-6258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster Affects Versions: 2.4.1 Reporter: Jian Fang In hadoop two, job history files are stored on HDFS with a default retention period of one week. In a cloud environment, these HDFS files are actually stored on the disks of ephemeral instances that could go away once the instances are terminated. Users may want to back up the job history files for issue investigation and performance analysis before and after the cluster is terminated. A centralized backup mechanism could have a scalability issue for big and busy Hadoop clusters where there are probably tens of thousands of jobs every day. As a result, it is preferred to have a distributed way to back up the job history files in this case. To achieve this goal, we could add a new feature to back up the job history files in Application master. More specifically, we could copy the job history files to a backup path when they are moved from the temporary staging directory to the intermediate_done path in application master. Since application masters could run on any slave nodes on a Hadoop cluster, we could achieve a better scalability by backing up the job history files in a distributed fashion. Please be aware, the backup path should be managed by the Hadoop users based on their needs. For example, some Hadoop users may copy the job history files to a cloud storage directly and keep them there forever. While some other users may want to store the job history files on local disks and clean them up from time to time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6258) add support to back up JHS files from application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6258: - Attachment: MAPREDUCE-6258.patch add support to back up JHS files from application master Key: MAPREDUCE-6258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster Affects Versions: 2.4.1 Reporter: Jian Fang Attachments: MAPREDUCE-6258.patch In hadoop two, job history files are stored on HDFS with a default retention period of one week. In a cloud environment, these HDFS files are actually stored on the disks of ephemeral instances that could go away once the instances are terminated. Users may want to back up the job history files for issue investigation and performance analysis before and after the cluster is terminated. A centralized backup mechanism could have a scalability issue for big and busy Hadoop clusters where there are probably tens of thousands of jobs every day. As a result, it is preferred to have a distributed way to back up the job history files in this case. To achieve this goal, we could add a new feature to back up the job history files in Application master. More specifically, we could copy the job history files to a backup path when they are moved from the temporary staging directory to the intermediate_done path in application master. Since application masters could run on any slave nodes on a Hadoop cluster, we could achieve a better scalability by backing up the job history files in a distributed fashion. Please be aware, the backup path should be managed by the Hadoop users based on their needs. For example, some Hadoop users may copy the job history files to a cloud storage directly and keep them there forever. While some other users may want to store the job history files on local disks and clean them up from time to time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6258) add support to back up JHS files from application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6258: - Status: Patch Available (was: Open) add support to back up JHS files from application master Key: MAPREDUCE-6258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster Affects Versions: 2.4.1 Reporter: Jian Fang Attachments: MAPREDUCE-6258.patch In hadoop two, job history files are stored on HDFS with a default retention period of one week. In a cloud environment, these HDFS files are actually stored on the disks of ephemeral instances that could go away once the instances are terminated. Users may want to back up the job history files for issue investigation and performance analysis before and after the cluster is terminated. A centralized backup mechanism could have a scalability issue for big and busy Hadoop clusters where there are probably tens of thousands of jobs every day. As a result, it is preferred to have a distributed way to back up the job history files in this case. To achieve this goal, we could add a new feature to back up the job history files in Application master. More specifically, we could copy the job history files to a backup path when they are moved from the temporary staging directory to the intermediate_done path in application master. Since application masters could run on any slave nodes on a Hadoop cluster, we could achieve a better scalability by backing up the job history files in a distributed fashion. Please be aware, the backup path should be managed by the Hadoop users based on their needs. For example, some Hadoop users may copy the job history files to a cloud storage directly and keep them there forever. While some other users may want to store the job history files on local disks and clean them up from time to time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308060#comment-14308060 ] Jian Fang commented on MAPREDUCE-6242: -- Thanks for your quick fix. Progress report log is incredibly excessive in application master - Key: MAPREDUCE-6242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.4.0 Reporter: Jian Fang Assignee: Varun Saxena Attachments: MAPREDUCE-6242.001.patch We saw incredibly excessive logs in application master for a long running one with many task attempts. The log write rate is around 1MB/sec in some cases. Most of the log entries were from the progress report such as the following ones. 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.15605757 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.4108217 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_02_0 is : 0.06634143 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.6506 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_01_0 is : 0.21723115 Looks like the report interval is controlled by a hard-coded variable PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We should allow users to set the appropriate progress interval for their applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305738#comment-14305738 ] Jian Fang commented on MAPREDUCE-6242: -- This seems to be easy to fix. Could you please let me know when the patch will be available? We have production issues due to this. Progress report log is incredibly excessive in application master - Key: MAPREDUCE-6242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.4.0 Reporter: Jian Fang Assignee: Varun Saxena We saw incredibly excessive logs in application master for a long running one with many task attempts. The log write rate is around 1MB/sec in some cases. Most of the log entries were from the progress report such as the following ones. 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.15605757 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.4108217 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_02_0 is : 0.06634143 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.6506 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_01_0 is : 0.21723115 Looks like the report interval is controlled by a hard-coded variable PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We should allow users to set the appropriate progress interval for their applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Attachment: MAPREDUCE-6111.2.patch Fixed the user parent path permission issue. Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.4.1 Reporter: Jian Fang Attachments: MAPREDUCE-6111.2.patch, MAPREDUCE-6111.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Status: Open (was: Patch Available) Need to fix the user permission Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.4.1, 2.5.0 Reporter: Jian Fang Attachments: MAPREDUCE-6111.2.patch, MAPREDUCE-6111.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Status: Patch Available (was: Open) Fixed user parent path permission issue. Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.4.1, 2.5.0 Reporter: Jian Fang Attachments: MAPREDUCE-6111.2.patch, MAPREDUCE-6111.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Attachment: (was: MAPREDUCE-6111.patch) Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.4.1 Reporter: Jian Fang Attachments: MAPREDUCE-6111.2.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
Jian Fang created MAPREDUCE-6111: Summary: Hadoop users' staging directories should be under a user folder Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.4.1, 2.5.0 Reporter: Jian Fang Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Attachment: MAPREDUCE-6111.patch Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.4.1 Reporter: Jian Fang Attachments: MAPREDUCE-6111.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Status: Patch Available (was: Open) Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.4.1, 2.5.0 Reporter: Jian Fang Attachments: MAPREDUCE-6111.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Attachment: MAPREDUCE-6111.patch Upload patch with fix for unit test. Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.4.1 Reporter: Jian Fang Attachments: MAPREDUCE-6111.patch, MAPREDUCE-6111.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Status: Open (was: Patch Available) Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.4.1, 2.5.0 Reporter: Jian Fang Attachments: MAPREDUCE-6111.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Attachment: (was: MAPREDUCE-6111.patch) Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0, 2.4.1 Reporter: Jian Fang Attachments: MAPREDUCE-6111.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6111) Hadoop users' staging directories should be under a user folder
[ https://issues.apache.org/jira/browse/MAPREDUCE-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-6111: - Status: Patch Available (was: Open) Resubmit patch with updated unit test Hadoop users' staging directories should be under a user folder --- Key: MAPREDUCE-6111 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6111 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.4.1, 2.5.0 Reporter: Jian Fang Attachments: MAPREDUCE-6111.patch Right now, Hadoop puts all users' staging directories under /tmp/hadoop-yarn/staging/, for example /tmp/hadoop-yarn/staging/hadoop for user hadoop, but the directory /tmp/hadoop-yarn/staging is also used for other purpose. For example, /tmp/hadoop-yarn/staging/history/ is used to hold finished JHS files. The shared parent /tmp/hadoop-yarn/staging makes it difficult to track all users' folders without adding extra logic to exclude other known folders. As a result, we should move all users' folders to a user sub-folder, i.e., /tmp/hadoop-yarn/staging/user/. In this case, user hadoop's staging folder becomes /tmp/hadoop-yarn/staging/user/hadoop/.staging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5703) Job client gets failure though RM side job execution result is FINISHED and SUCCEEDED
[ https://issues.apache.org/jira/browse/MAPREDUCE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Fang updated MAPREDUCE-5703: - Priority: Critical (was: Major) Job client gets failure though RM side job execution result is FINISHED and SUCCEEDED - Key: MAPREDUCE-5703 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5703 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Ashutosh Jindal Priority: Critical 1) Run MR job 2) After reduce completed and while JHS file writing, restart DN. RM side job is shown as successful. JHS doesnt have info about the job. Job client gets NPE and exit code as 255. java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:929) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2080) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2076) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2074) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:330) at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:382) at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:529) at org.apache.hadoop.mapreduce.Job$5.run(Job.java:668) at org.apache.hadoop.mapreduce.Job$5.run(Job.java:665) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:665) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1349) at org.apache.hadoop.mapred.JobClient$NetworkedJob.monitorAndPrintJob(JobClient.java:407) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:855) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:835) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5703) Job client gets failure though RM side job execution result is FINISHED and SUCCEEDED
[ https://issues.apache.org/jira/browse/MAPREDUCE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099376#comment-14099376 ] Jian Fang commented on MAPREDUCE-5703: -- We have a cluster with 3 data nodes, but due to some reason, the job history was not persisted successfully as shown in the AM log. -LOG-- ERROR [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing History Event: org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinishedEvent@1f2cfc93 java.io.IOException: All datanodes 10.253.21.212:9200 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1140) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:936) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:491) As a result, the job failed and threw NPE. The bad thing is the job was marked as failed even though the job actually finished successfully. This means this NPE could happen frequently if the cluster size is not big and it is not an edge case. The method GetTaskAttemptCompletionEventsResponse() fetched a Job by calling verifyAndGetJob(), but it never checked if job was null or not, which was the root cause of this issue. public GetTaskAttemptCompletionEventsResponse getTaskAttemptCompletionEvents( GetTaskAttemptCompletionEventsRequest request) throws IOException { JobId jobId = request.getJobId(); int fromEventId = request.getFromEventId(); int maxEvents = request.getMaxEvents(); Job job = verifyAndGetJob(jobId); GetTaskAttemptCompletionEventsResponse response = recordFactory.newRecordInstance(GetTaskAttemptCompletionEventsResponse.class); response.addAllCompletionEvents(Arrays.asList(job.getTaskAttemptCompletionEvents(fromEventId, maxEvents))); return response; } Since people may face this problem often for a small cluster, what would be the best way to fix this issue then? Do retry when save the job to HDFS? Or something else? Job client gets failure though RM side job execution result is FINISHED and SUCCEEDED - Key: MAPREDUCE-5703 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5703 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Ashutosh Jindal 1) Run MR job 2) After reduce completed and while JHS file writing, restart DN. RM side job is shown as successful. JHS doesnt have info about the job. Job client gets NPE and exit code as 255. java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:929) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2080) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2076) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2074) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:330) at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:382) at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:529) at org.apache.hadoop.mapreduce.Job$5.run(Job.java:668) at org.apache.hadoop.mapreduce.Job$5.run(Job.java:665) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at