[jira] [Commented] (MAPREDUCE-6607) Enable regex pattern matching when mapreduce.task.files.preserve.filepattern is set
[ https://issues.apache.org/jira/browse/MAPREDUCE-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296444#comment-15296444 ] Maysam Yabandeh commented on MAPREDUCE-6607: Thank you [~lewuathe] and [~ajisakaa]. > Enable regex pattern matching when mapreduce.task.files.preserve.filepattern > is set > --- > > Key: MAPREDUCE-6607 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6607 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 2.7.1 >Reporter: Maysam Yabandeh >Assignee: Kai Sasaki >Priority: Minor > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6607-branch-2.01.patch, > MAPREDUCE-6607-branch-2.02.patch, MAPREDUCE-6607.01.patch, > MAPREDUCE-6607.02.patch, MAPREDUCE-6607.03.patch, MAPREDUCE-6607.04.patch, > MAPREDUCE-6607.05.patch, MAPREDUCE-6607.06.patch > > > if either of the following configs are set, then .staging dir is not cleaned > up: > * mapreduce.task.files.preserve.failedtask > * mapreduce.task.files.preserve.filepattern > The former was supposed to keep only .staging of failed tasks and the latter > was supposed to be used only if that task name matches against the specified > regular expression. > {code} > protected boolean keepJobFiles(JobConf conf) { > return (conf.getKeepTaskFilesPattern() != null || conf > .getKeepFailedTaskFiles()); > } > {code} > {code} > public void cleanupStagingDir() throws IOException { > /* make sure we clean the staging files */ > String jobTempDir = null; > FileSystem fs = getFileSystem(getConfig()); > try { > if (!keepJobFiles(new JobConf(getConfig( { > jobTempDir = getConfig().get(MRJobConfig.MAPREDUCE_JOB_DIR); > if (jobTempDir == null) { > LOG.warn("Job Staging directory is null"); > return; > } > Path jobTempDirPath = new Path(jobTempDir); > LOG.info("Deleting staging directory " + > FileSystem.getDefaultUri(getConfig()) + > " " + jobTempDir); > fs.delete(jobTempDirPath, true); > } > } catch(IOException io) { > LOG.error("Failed to cleanup staging dir " + jobTempDir, io); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6607) .staging dir is not cleaned up if mapreduce.task.files.preserve.failedtask or mapreduce.task.files.preserve.filepattern are set
[ https://issues.apache.org/jira/browse/MAPREDUCE-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-6607: --- Summary: .staging dir is not cleaned up if mapreduce.task.files.preserve.failedtask or mapreduce.task.files.preserve.filepattern are set (was: .staging dir is not cleanup if mapreduce.task.files.preserve.failedtask or mapreduce.task.files.preserve.filepattern are set) > .staging dir is not cleaned up if mapreduce.task.files.preserve.failedtask or > mapreduce.task.files.preserve.filepattern are set > --- > > Key: MAPREDUCE-6607 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6607 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 2.7.1 >Reporter: Maysam Yabandeh >Assignee: Kai Sasaki >Priority: Minor > Attachments: MAPREDUCE-6607.01.patch > > > if either of the following configs are set, then .staging dir is not cleaned > up: > * mapreduce.task.files.preserve.failedtask > * mapreduce.task.files.preserve.filepattern > The former was supposed to keep only .staging of failed tasks and the latter > was supposed to be used only if that task name matches against the specified > regular expression. > {code} > protected boolean keepJobFiles(JobConf conf) { > return (conf.getKeepTaskFilesPattern() != null || conf > .getKeepFailedTaskFiles()); > } > {code} > {code} > public void cleanupStagingDir() throws IOException { > /* make sure we clean the staging files */ > String jobTempDir = null; > FileSystem fs = getFileSystem(getConfig()); > try { > if (!keepJobFiles(new JobConf(getConfig( { > jobTempDir = getConfig().get(MRJobConfig.MAPREDUCE_JOB_DIR); > if (jobTempDir == null) { > LOG.warn("Job Staging directory is null"); > return; > } > Path jobTempDirPath = new Path(jobTempDir); > LOG.info("Deleting staging directory " + > FileSystem.getDefaultUri(getConfig()) + > " " + jobTempDir); > fs.delete(jobTempDirPath, true); > } > } catch(IOException io) { > LOG.error("Failed to cleanup staging dir " + jobTempDir, io); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6607) .staging dir is not cleanup if mapreduce.task.files.preserve.failedtask or mapreduce.task.files.preserve.filepattern are set
Maysam Yabandeh created MAPREDUCE-6607: -- Summary: .staging dir is not cleanup if mapreduce.task.files.preserve.failedtask or mapreduce.task.files.preserve.filepattern are set Key: MAPREDUCE-6607 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6607 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.7.1 Reporter: Maysam Yabandeh Priority: Minor if either of the following configs are set, then .staging dir is not cleaned up: * mapreduce.task.files.preserve.failedtask * mapreduce.task.files.preserve.filepattern The former was supposed to keep only .staging of failed tasks and the latter was supposed to be used only if that task name matches against the specified regular expression. {code} protected boolean keepJobFiles(JobConf conf) { return (conf.getKeepTaskFilesPattern() != null || conf .getKeepFailedTaskFiles()); } {code} {code} public void cleanupStagingDir() throws IOException { /* make sure we clean the staging files */ String jobTempDir = null; FileSystem fs = getFileSystem(getConfig()); try { if (!keepJobFiles(new JobConf(getConfig( { jobTempDir = getConfig().get(MRJobConfig.MAPREDUCE_JOB_DIR); if (jobTempDir == null) { LOG.warn("Job Staging directory is null"); return; } Path jobTempDirPath = new Path(jobTempDir); LOG.info("Deleting staging directory " + FileSystem.getDefaultUri(getConfig()) + " " + jobTempDir); fs.delete(jobTempDirPath, true); } } catch(IOException io) { LOG.error("Failed to cleanup staging dir " + jobTempDir, io); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-6489: --- Attachment: MAPREDUCE-6489-branch-2.003.patch Thanks [~jlowe] for the review. I am attaching MAPREDUCE-6489-branch-2.003.patch for branch-2. > Fail fast rogue tasks that write too much to local disk > --- > > Key: MAPREDUCE-6489 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.1 >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: MAPREDUCE-6489-branch-2.003.patch, > MAPREDUCE-6489.001.patch, MAPREDUCE-6489.002.patch, MAPREDUCE-6489.003.patch > > > Tasks of the rogue jobs can write too much to local disk, negatively > affecting the jobs running in collocated containers. Ideally YARN will be > able to limit amount of local disk used by each task: YARN-4011. Until then, > the mapreduce task can fail fast if the task is writing too much (above a > configured threshold) to local disk. > As we discussed > [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750] > the suggested approach is that the MapReduce task checks for BYTES_WRITTEN > counter for the local disk and throws an exception when it goes beyond a > configured value. It is true that written bytes is larger than the actual > used disk space, but to detect a rogue task the exact value is not required > and a very large value for written bytes to local disk is a good indicative > that the task is misbehaving. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-6489: --- Attachment: MAPREDUCE-6489.001.patch Uploading the first patch. The patch updates TaskReporter thread to also check the limits on the counters each time they are updated. If BYTES_WRITTEN counter exceeds the configured limit it fails fast with ExitUtil.terminate() Reviews are appreciated. > Fail fast rogue tasks that write too much to local disk > --- > > Key: MAPREDUCE-6489 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.1 >Reporter: Maysam Yabandeh > Attachments: MAPREDUCE-6489.001.patch > > > Tasks of the rogue jobs can write too much to local disk, negatively > affecting the jobs running in collocated containers. Ideally YARN will be > able to limit amount of local disk used by each task: YARN-4011. Until then, > the mapreduce task can fail fast if the task is writing too much (above a > configured threshold) to local disk. > As we discussed > [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750] > the suggested approach is that the MapReduce task checks for BYTES_WRITTEN > counter for the local disk and throws an exception when it goes beyond a > configured value. It is true that written bytes is larger than the actual > used disk space, but to detect a rogue task the exact value is not required > and a very large value for written bytes to local disk is a good indicative > that the task is misbehaving. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-6489: --- Assignee: Maysam Yabandeh Status: Patch Available (was: Open) > Fail fast rogue tasks that write too much to local disk > --- > > Key: MAPREDUCE-6489 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.1 >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: MAPREDUCE-6489.001.patch > > > Tasks of the rogue jobs can write too much to local disk, negatively > affecting the jobs running in collocated containers. Ideally YARN will be > able to limit amount of local disk used by each task: YARN-4011. Until then, > the mapreduce task can fail fast if the task is writing too much (above a > configured threshold) to local disk. > As we discussed > [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750] > the suggested approach is that the MapReduce task checks for BYTES_WRITTEN > counter for the local disk and throws an exception when it goes beyond a > configured value. It is true that written bytes is larger than the actual > used disk space, but to detect a rogue task the exact value is not required > and a very large value for written bytes to local disk is a good indicative > that the task is misbehaving. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-6489: --- Attachment: MAPREDUCE-6489.002.patch > Fail fast rogue tasks that write too much to local disk > --- > > Key: MAPREDUCE-6489 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.1 >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: MAPREDUCE-6489.001.patch, MAPREDUCE-6489.002.patch > > > Tasks of the rogue jobs can write too much to local disk, negatively > affecting the jobs running in collocated containers. Ideally YARN will be > able to limit amount of local disk used by each task: YARN-4011. Until then, > the mapreduce task can fail fast if the task is writing too much (above a > configured threshold) to local disk. > As we discussed > [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750] > the suggested approach is that the MapReduce task checks for BYTES_WRITTEN > counter for the local disk and throws an exception when it goes beyond a > configured value. It is true that written bytes is larger than the actual > used disk space, but to detect a rogue task the exact value is not required > and a very large value for written bytes to local disk is a good indicative > that the task is misbehaving. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk
Maysam Yabandeh created MAPREDUCE-6489: -- Summary: Fail fast rogue tasks that write too much to local disk Key: MAPREDUCE-6489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 2.7.1 Reporter: Maysam Yabandeh Tasks of the rogue jobs can write too much to local disk, negatively affecting the jobs running in collocated containers. Ideally YARN will be able to limit amount of local disk used by each task: YARN-4011. Until then, the mapreduce task can fail fast if the task is writing too much (above a configured threshold) to local disk. As we discussed [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750] the suggested approach is that the MapReduce task checks for BYTES_WRITTEN counter for the local disk and throws an exception when it goes beyond a configured value. It is true that written bytes is larger than the actual used disk space, but to detect a rogue task the exact value is not required and a very large value for written bytes to local disk is a good indicative that the task is misbehaving. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5871: --- Resolution: Later Status: Resolved (was: Patch Available) So far the only application for estimating job end-time was YARN-1969 which is currently canceled. Canceling this jira as well. We can resume it later when there was another specific application for it. Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Labels: BB2015-05-TBR Attachments: MAPREDUCE-5871.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6118) Uber-mode decision does not consider -Xmx
Maysam Yabandeh created MAPREDUCE-6118: -- Summary: Uber-mode decision does not consider -Xmx Key: MAPREDUCE-6118 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6118 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Priority: Minor Current the decision on using uber-mode is based on the mapper container size and the AM container size. However the actual memory at AM is limited by -Xmx options passed via yarn.app.mapreduce.am.command-opts. For example: AM memory: 4G, yarn.app.mapreduce.am.command-opts=-Xmx2048GB, mapper memory: 3GB. Here the uber job could face OOM whereas a non-uber execution would not. We faced this problem recently and forced to disable uber-mode to circumvent the problem. One idea, although a bit ugly, is to parse jvm opts, extract Xmx, and take it into account for uber-mode decision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6118) Uber-mode decision does not consider -Xmx
[ https://issues.apache.org/jira/browse/MAPREDUCE-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157635#comment-14157635 ] Maysam Yabandeh commented on MAPREDUCE-6118: not every detail is covered in examples. it is there just to communicate the idea. Uber-mode decision does not consider -Xmx - Key: MAPREDUCE-6118 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6118 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Priority: Minor Current the decision on using uber-mode is based on the mapper container size and the AM container size. However the actual memory at AM is limited by -Xmx options passed via yarn.app.mapreduce.am.command-opts. For example: AM memory: 4G, yarn.app.mapreduce.am.command-opts=-Xmx2048GB, mapper memory: 3GB. Here the uber job could face OOM whereas a non-uber execution would not. We faced this problem recently and forced to disable uber-mode to circumvent the problem. One idea, although a bit ugly, is to parse jvm opts, extract Xmx, and take it into account for uber-mode decision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5954) Optional exclusion of counters from getTaskReports
[ https://issues.apache.org/jira/browse/MAPREDUCE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131052#comment-14131052 ] Maysam Yabandeh commented on MAPREDUCE-5954: Thanks [~jira.shegalov]. Currently the use case is that the monitoring tool can be recompiled to benefit from the added API to the MRAppMaster. The MR jobs themselves should not need a recompile. Can you explain a bit more why an existing MR job would need to recompile with the current patch? Optional exclusion of counters from getTaskReports -- Key: MAPREDUCE-5954 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5954 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5954.patch MRClientService#getTaskReports returns the set of map or reduce tasks along with their counters, which are quite large. For big jobs, the response could be as large as 0.5 GB. This has a negative impact both on MRAppMaster and the monitoring tool that invokes getTaskReports. This problem has led Pig users to entirely disable getTaskReports for big jobs: https://issues.apache.org/jira/browse/PIG-4043 Many monitoring tools, including ours, do not need the task counters when invoking getTaskReports. Pig also does not make any use of task counters. Here are the usages of Tasks in pig: {code} protected void getErrorMessages(TaskReport reports[], String type, String msgs[] = reports[i].getDiagnostics(); if (HadoopShims.isJobFailed(reports[i])) { {code} and {code} protected long computeTimeSpent(TaskReport[] taskReports) { long timeSpent = 0; for (TaskReport r : taskReports) { timeSpent += (r.getFinishTime() - r.getStartTime()); } return timeSpent; } {code} GetTaskReportsRequest can be augmented with an optional boolean with which the monitoring tool can request excluding the counters form the response. This minor change is very simple and yet makes many existing monitoring tools more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6043) Reducer-preemption does not kick in
[ https://issues.apache.org/jira/browse/MAPREDUCE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112428#comment-14112428 ] Maysam Yabandeh commented on MAPREDUCE-6043: Thanks [~jlowe] for the comments. Further investigations suggest that the cause for the lost messages, as you correctly mentioned, was a bug introduced by one of our recent internal patches (the container is successfully finished but its status is not successfully transmitted to MRAppMaster through RM). Nevertheless I was under impression that there is double standards in MRAppMaster, sometimes relying on RM to successfully transmit statuses and sometimes not. But after your comment, I double checked and noticed that the logic for the exceptional case described in case 1 is only available in our internal repository and has not made it to the trunk yet. So, I now understand that the containers statuses reported by NM are guaranteed to be received by MRAppMaster through RM, and MRAppMaster's logic need not to be resilient against such cases. Reducer-preemption does not kick in --- Key: MAPREDUCE-6043 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6043 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh We have seen various cases that reducer-preemption does not kick in and the scheduled mappers wait behind running reducers forever. Each time there seems to be a different scenario. So far we have tracked down two of such cases and the common element between them is that the variables in RMContainerAllocator go out of sync since they only get updated when completed container is reported by RM. However there are many corner cases that such report is not received from RM and yet the MapReduce app moves forward. Perhaps one possible fix would be to update such variables also after exceptional cases. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. Each of these variables going out of sync will cause the preemption not to kick in. In the following comment, we explain two of such cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-6043) Lost messages from RM to MRAppMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-6043: --- Summary: Lost messages from RM to MRAppMaster (was: Reducer-preemption does not kick in) Lost messages from RM to MRAppMaster Key: MAPREDUCE-6043 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6043 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh We have seen various cases that reducer-preemption does not kick in and the scheduled mappers wait behind running reducers forever. Each time there seems to be a different scenario. So far we have tracked down two of such cases and the common element between them is that the variables in RMContainerAllocator go out of sync since they only get updated when completed container is reported by RM. However there are many corner cases that such report is not received from RM and yet the MapReduce app moves forward. Perhaps one possible fix would be to update such variables also after exceptional cases. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. Each of these variables going out of sync will cause the preemption not to kick in. In the following comment, we explain two of such cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-6043) Lost messages from RM to MRAppMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh resolved MAPREDUCE-6043. Resolution: Invalid Lost messages from RM to MRAppMaster Key: MAPREDUCE-6043 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6043 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh We have seen various cases that reducer-preemption does not kick in and the scheduled mappers wait behind running reducers forever. Each time there seems to be a different scenario. So far we have tracked down two of such cases and the common element between them is that the variables in RMContainerAllocator go out of sync since they only get updated when completed container is reported by RM. However there are many corner cases that such report is not received from RM and yet the MapReduce app moves forward. Perhaps one possible fix would be to update such variables also after exceptional cases. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. Each of these variables going out of sync will cause the preemption not to kick in. In the following comment, we explain two of such cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6043) Reducer-preemption does not kick in
[ https://issues.apache.org/jira/browse/MAPREDUCE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111622#comment-14111622 ] Maysam Yabandeh commented on MAPREDUCE-6043: I was planning to work on a patch that maintains those variables also in exceptional corner cases. Comments are highly appreciated. Reducer-preemption does not kick in --- Key: MAPREDUCE-6043 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6043 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh We have seen various cases that reducer-preemption does not kick in and the scheduled mappers wait behind running reducers forever. Each time there seems to be a different scenario. So far we have tracked down two of such cases and the common element between them is that the variables in RMContainerAllocator go out of sync since they only get updated when completed container is reported by RM. However there are many corner cases that such report is not received from RM and yet the MapReduce app moves forward. Perhaps one possible fix would be to update such variables also after exceptional cases. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. Each of these variables going out of sync will cause the preemption not to kick in. In the following comment, we explain two of such cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6043) Reducer-preemption does not kick in
Maysam Yabandeh created MAPREDUCE-6043: -- Summary: Reducer-preemption does not kick in Key: MAPREDUCE-6043 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6043 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh We have seen various cases that reducer-preemption does not kick in and the scheduled mappers wait behind running reducers forever. Each time there seems to be a different scenario. So far we have tracked down two of such cases and the common element between them is that the variables in RMContainerAllocator go out of sync since they only get updated when completed container is reported by RM. However there are many corner cases that such report is not received from RM and yet the MapReduce app moves forward. Perhaps one possible fix would be to update such variables also after exceptional cases. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. Each of these variables going out of sync will cause the preemption not to kick in. In the following comment, we explain two of such cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6043) Reducer-preemption does not kick in
[ https://issues.apache.org/jira/browse/MAPREDUCE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105664#comment-14105664 ] Maysam Yabandeh commented on MAPREDUCE-6043: The value of this variables in the first case were this: {code} headroom = 0; pr = 0; //reducer preemption was never called at this app {code} so the triggering condition for reducer preemption is summarized to: {code} am * |m| |m| {code} or {code} am 1 {code} In this erroneous case we had two assigned mappers that were not successfully removed from the list and hence prevent preemption from kicking in. Those mappers were finished but the MRAppMaster did not hear anything about them afterwards so called them successful after one minute timeout: {code} 2014-08-20 04:25:21,665 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:attempt_xxx_yyy_m_000288_0 Timed out after 60 secs 2014-08-20 04:25:21,665 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Task attempt attempt_xxx_yyy_m_000288_0 is done fromTaskUmbilicalProtocol's point of view. However, it stays in finishing state for too long 2014-08-20 04:25:21,665 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_xxx_yyy_m_000288_0 TaskAttempt Transitioned from FINISHING_CONTAINER to SUCCESS_CONTAINER_CLEANUP {code} Reducer-preemption does not kick in --- Key: MAPREDUCE-6043 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6043 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh We have seen various cases that reducer-preemption does not kick in and the scheduled mappers wait behind running reducers forever. Each time there seems to be a different scenario. So far we have tracked down two of such cases and the common element between them is that the variables in RMContainerAllocator go out of sync since they only get updated when completed container is reported by RM. However there are many corner cases that such report is not received from RM and yet the MapReduce app moves forward. Perhaps one possible fix would be to update such variables also after exceptional cases. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. Each of these variables going out of sync will cause the preemption not to kick in. In the following comment, we explain two of such cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5954) Optional exclusion of counters from getTaskReports
[ https://issues.apache.org/jira/browse/MAPREDUCE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5954: --- Attachment: MAPREDUCE-5954.patch Attaching the patch. Optional exclusion of counters from getTaskReports -- Key: MAPREDUCE-5954 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5954 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5954.patch MRClientService#getTaskReports returns the set of map or reduce tasks along with their counters, which are quite large. For big jobs, the response could be as large as 0.5 GB. This has a negative impact both on MRAppMaster and the monitoring tool that invokes getTaskReports. This problem has led Pig users to entirely disable getTaskReports for big jobs: https://issues.apache.org/jira/browse/PIG-4043 Many monitoring tools, including ours, do not need the task counters when invoking getTaskReports. Pig also does not make any use of task counters. Here are the usages of Tasks in pig: {code} protected void getErrorMessages(TaskReport reports[], String type, String msgs[] = reports[i].getDiagnostics(); if (HadoopShims.isJobFailed(reports[i])) { {code} and {code} protected long computeTimeSpent(TaskReport[] taskReports) { long timeSpent = 0; for (TaskReport r : taskReports) { timeSpent += (r.getFinishTime() - r.getStartTime()); } return timeSpent; } {code} GetTaskReportsRequest can be augmented with an optional boolean with which the monitoring tool can request excluding the counters form the response. This minor change is very simple and yet makes many existing monitoring tools more efficient. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5954) Optional exclusion of counters from getTaskReports
[ https://issues.apache.org/jira/browse/MAPREDUCE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5954: --- Status: Patch Available (was: Open) Optional exclusion of counters from getTaskReports -- Key: MAPREDUCE-5954 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5954 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5954.patch MRClientService#getTaskReports returns the set of map or reduce tasks along with their counters, which are quite large. For big jobs, the response could be as large as 0.5 GB. This has a negative impact both on MRAppMaster and the monitoring tool that invokes getTaskReports. This problem has led Pig users to entirely disable getTaskReports for big jobs: https://issues.apache.org/jira/browse/PIG-4043 Many monitoring tools, including ours, do not need the task counters when invoking getTaskReports. Pig also does not make any use of task counters. Here are the usages of Tasks in pig: {code} protected void getErrorMessages(TaskReport reports[], String type, String msgs[] = reports[i].getDiagnostics(); if (HadoopShims.isJobFailed(reports[i])) { {code} and {code} protected long computeTimeSpent(TaskReport[] taskReports) { long timeSpent = 0; for (TaskReport r : taskReports) { timeSpent += (r.getFinishTime() - r.getStartTime()); } return timeSpent; } {code} GetTaskReportsRequest can be augmented with an optional boolean with which the monitoring tool can request excluding the counters form the response. This minor change is very simple and yet makes many existing monitoring tools more efficient. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5954) Optional exclusion of counters from getTaskReports
[ https://issues.apache.org/jira/browse/MAPREDUCE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053010#comment-14053010 ] Maysam Yabandeh commented on MAPREDUCE-5954: The timed out unit test seems irrelevant. Optional exclusion of counters from getTaskReports -- Key: MAPREDUCE-5954 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5954 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5954.patch MRClientService#getTaskReports returns the set of map or reduce tasks along with their counters, which are quite large. For big jobs, the response could be as large as 0.5 GB. This has a negative impact both on MRAppMaster and the monitoring tool that invokes getTaskReports. This problem has led Pig users to entirely disable getTaskReports for big jobs: https://issues.apache.org/jira/browse/PIG-4043 Many monitoring tools, including ours, do not need the task counters when invoking getTaskReports. Pig also does not make any use of task counters. Here are the usages of Tasks in pig: {code} protected void getErrorMessages(TaskReport reports[], String type, String msgs[] = reports[i].getDiagnostics(); if (HadoopShims.isJobFailed(reports[i])) { {code} and {code} protected long computeTimeSpent(TaskReport[] taskReports) { long timeSpent = 0; for (TaskReport r : taskReports) { timeSpent += (r.getFinishTime() - r.getStartTime()); } return timeSpent; } {code} GetTaskReportsRequest can be augmented with an optional boolean with which the monitoring tool can request excluding the counters form the response. This minor change is very simple and yet makes many existing monitoring tools more efficient. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5954) Optional exclusion of counters from getTaskReports
[ https://issues.apache.org/jira/browse/MAPREDUCE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5954: --- Description: MRClientService#getTaskReports returns the set of map or reduce tasks along with their counters, which are quite large. For big jobs, the response could be as large as 0.5 GB. This has a negative impact both on MRAppMaster and the monitoring tool that invokes getTaskReports. This problem has led Pig users to entirely disable getTaskReports for big jobs: https://issues.apache.org/jira/browse/PIG-4043 Many monitoring tools, including ours, do not need the task counters when invoking getTaskReports. Pig also does not make any use of task counters. Here are the usages of Tasks in pig: {code} protected void getErrorMessages(TaskReport reports[], String type, String msgs[] = reports[i].getDiagnostics(); if (HadoopShims.isJobFailed(reports[i])) { {code} and {code} protected long computeTimeSpent(TaskReport[] taskReports) { long timeSpent = 0; for (TaskReport r : taskReports) { timeSpent += (r.getFinishTime() - r.getStartTime()); } return timeSpent; } {code} GetTaskReportsRequest can be augmented with an optional boolean with which the monitoring tool can request excluding the counters form the response. This minor change is very simple and yet makes many existing monitoring tools more efficient. was: MRClientService.getTaskReport returns the set of map or reduce tasks along with their counters, which are quite large. For big jobs, the response could be as large as 0.5 GB. This has a negative impact both on MRAppMaster and the monitoring tool that invokes getTaskReports. This problem has led Pig users to entirely disable getTaskReports for big jobs: https://issues.apache.org/jira/browse/PIG-4043 Many monitoring tools, including ours, do not need the task counters when invoking getTaskReports. Pig also does not make any use of task counters. Here are the usages of Tasks in pig: {code} protected void getErrorMessages(TaskReport reports[], String type, String msgs[] = reports[i].getDiagnostics(); if (HadoopShims.isJobFailed(reports[i])) { {code} and {code} protected long computeTimeSpent(TaskReport[] taskReports) { long timeSpent = 0; for (TaskReport r : taskReports) { timeSpent += (r.getFinishTime() - r.getStartTime()); } return timeSpent; } {code} GetTaskReportsRequest can be augmented with an optional boolean with which the monitoring tool can request excluding the counters form the response. This minor change is very simple and yet makes many existing monitoring tools more efficient. Optional exclusion of counters from getTaskReports -- Key: MAPREDUCE-5954 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5954 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh MRClientService#getTaskReports returns the set of map or reduce tasks along with their counters, which are quite large. For big jobs, the response could be as large as 0.5 GB. This has a negative impact both on MRAppMaster and the monitoring tool that invokes getTaskReports. This problem has led Pig users to entirely disable getTaskReports for big jobs: https://issues.apache.org/jira/browse/PIG-4043 Many monitoring tools, including ours, do not need the task counters when invoking getTaskReports. Pig also does not make any use of task counters. Here are the usages of Tasks in pig: {code} protected void getErrorMessages(TaskReport reports[], String type, String msgs[] = reports[i].getDiagnostics(); if (HadoopShims.isJobFailed(reports[i])) { {code} and {code} protected long computeTimeSpent(TaskReport[] taskReports) { long timeSpent = 0; for (TaskReport r : taskReports) { timeSpent += (r.getFinishTime() - r.getStartTime()); } return timeSpent; } {code} GetTaskReportsRequest can be augmented with an optional boolean with which the monitoring tool can request excluding the counters form the response. This minor change is very simple and yet makes many existing monitoring tools more efficient. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5954) Optional exclusion of counters from getTaskReports
Maysam Yabandeh created MAPREDUCE-5954: -- Summary: Optional exclusion of counters from getTaskReports Key: MAPREDUCE-5954 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5954 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh MRClientService.getTaskReport returns the set of map or reduce tasks along with their counters, which are quite large. For big jobs, the response could be as large as 0.5 GB. This has a negative impact both on MRAppMaster and the monitoring tool that invokes getTaskReports. This problem has led Pig users to entirely disable getTaskReports for big jobs: https://issues.apache.org/jira/browse/PIG-4043 Many monitoring tools, including ours, do not need the task counters when invoking getTaskReports. Pig also does not make any use of task counters. Here are the usages of Tasks in pig: {code} protected void getErrorMessages(TaskReport reports[], String type, String msgs[] = reports[i].getDiagnostics(); if (HadoopShims.isJobFailed(reports[i])) { {code} and {code} protected long computeTimeSpent(TaskReport[] taskReports) { long timeSpent = 0; for (TaskReport r : taskReports) { timeSpent += (r.getFinishTime() - r.getStartTime()); } return timeSpent; } {code} GetTaskReportsRequest can be augmented with an optional boolean with which the monitoring tool can request excluding the counters form the response. This minor change is very simple and yet makes many existing monitoring tools more efficient. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Add a configurable delay to reducer-preemption
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844-branch-2.patch [~kasha], attaching the patch for branch-2. Add a configurable delay to reducer-preemption -- Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844-branch-2.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844.patch I reverted the changes about visibility of existing methods as well as added synchronization to address findbugs warnings. I went through code with [~sjlee0] and it seems to us that the current synchronization is enough to protect the variables. Therefore making the variables AtomicInteger would incur the extra sync cost with no clear benefit. It could also add confusion about sync policy in the code. I am submitting the patch with reverted visibilities and if findbugs complains again I would suggest adding it to the exclude list. (sorry that findbugs on my laptop seems not be working, will work on that). About the new location of TestRMContainerAllocator.java, I see that updated by the patch: {code} diff --git hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java similarity index 93% rename from hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java rename to hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java {code} I generated the patch with git diff trunk --no-prefix. It might be an inconsistency issue of git with patch -p0. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036518#comment-14036518 ] Maysam Yabandeh commented on MAPREDUCE-5844: Just noticed that the findbugs warnings are actually old and have been surpassed before. It came up again since we changed the variable names: {code} !-- The below fields are accessed locally and only via methods that are synchronized. -- Match Class name=org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator / Or Field name=mapResourceReqt / Field name=reduceResourceReqt / Field name=maxReduceRampupLimit / Field name=reduceSlowStart / /Or Bug pattern=IS2_INCONSISTENT_SYNC / /Match {code} Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844.patch Uploading a patch that propagtes the var name update into findbugs-exclude.xml Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034455#comment-14034455 ] Maysam Yabandeh commented on MAPREDUCE-5844: Sorry I did the moving but I forgot to update the visibilities. Will do. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844.patch Attaching the new patch that also restricts the visibilities to package level. I took the liberty to apply the same pattern on also the already existing public methods (not previously touched by the patch) whose visibilities were relaxed for testing purposes. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844.patch Attaching the patch updated with synchronizing the newly added methods to address the findbugs concern. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844.patch The findbugs warnings seem to be false alarm. The method which it is complaining about is not touched by the patch. Also it is accessed via a sync method: heartbeat - assign. Still submitting a new patch that makes #assign synchronized to avoid findbugs false alarms. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034757#comment-14034757 ] Maysam Yabandeh commented on MAPREDUCE-5844: findbugs warnings seems like false alarm to me. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844.patch Thanks [~kasha] for the comments. I am attaching a new patch that has them applied. I was thinking about a proper name for setReduceResourceReqt. On one hand, by changing it to setReduceResourceRequest it becomes more readable. On the other hand, by using setReduceResourceReqt we adhere to the java standard for naming getters and setter (here reduceResourceReqt). I am more inclined towards the latter and I was wondering if you are ok with that. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844.patch Attaching the patch that also updates the variables' names: reduceResourceRequest and mapResourceRequest Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844.patch Attaching the new patch that also contains the unit test and updated name for the conf param. [~kasha], as per your suggestion quite a few visibilities in the source code are relaxed (tagged with @VisibleForTesting) to allow testing with reasonable complexity. The patch includes a test of preemptReducesIfNeed for both before and after the changes made by this jira. [~jlowe], as per your suggestion the conf param name is updated and documented in mapreduce-default.xml. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027123#comment-14027123 ] Maysam Yabandeh commented on MAPREDUCE-5844: Thanks [~kasha] for reviewing it. About the unit test, I looked into it and it seems to be non-trivial to me: On one hand preemptReducesIfNeeded uses local fields and is not feasible to be tested separately via mocking. The alternative would be to test the entire RMContainerAllocator object; however to make sure that preemptReducesIfNeeded is exercised in the test RMContainerAllocator object should be fed with a complicated set of events: some mappers are not finished, but enough are finished to trigger reducer start, and finally mapper failure. The complexity of the unit test in this way would much more than that of the minor change introduced by the patch. I guess it would be possible to come up with unit tests with reasonable complexity if we make changes into the RMContainerAllocator to make it more testable, but I am not sure whether such changes are desirable as part of this jira. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: (was: namenode-gc.2014-05-26-23-29.log.0) Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011249#comment-14011249 ] Maysam Yabandeh commented on MAPREDUCE-5844: Thanks [~wangda]. I guess there is no unanimity about the default value of the threshold, as some also suggested to have its default equal to zero. We are planning to have it enabled in our config settings anyway, but I let the community decide on the default value in the source code. I think using the timestamp of the last allocated mapper is interesting since we would keep only one timestamp versus one per mapper request. The challenge however would be that it does not clarify how recent is each map request. We can of course have another single timestamp for earliest received map request but maintaining it after one of the many inflight map requests get allocated would add a bit of more complexity to the patch and its logic. I figured having the logic in the patch as simple as possible would justify a new timestamp field per container request. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: namenode-gc.2014-05-26-23-29.log.0 attaching gc log when trying g1 Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch, namenode-gc.2014-05-26-23-29.log.0 We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Status: Patch Available (was: In Progress) Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5844: --- Attachment: MAPREDUCE-5844.patch Attaching the patch to delay the preemption for a configurable threshold. The patch simply adds a timestamp to ContainerRequest and check it against the current time. The reviews are highly appreciated. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-5844 started by Maysam Yabandeh. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003454#comment-14003454 ] Maysam Yabandeh commented on MAPREDUCE-5844: Thanks [~jlowe] and [~kasha]. Sounds great! I will submit a patch soon. The patch adds a timestamp to each scheduled mapper, and triggers a preemption when a configurable threshold is passed the timestamp. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000413#comment-14000413 ] Maysam Yabandeh commented on MAPREDUCE-5844: [~jlowe], we observed more of these cases where the queue was actually full and a fix for proper headroom calculation would not help either. The thing is that although the queue might be full at each point of time, there is a constant flow of containers completing in the queue and new containers being assigned. Therefore, if the MRAppMaster does not aggressively preempt its reducer and ask for a container for its failed mapper, it is actually quite likely to get the mapper in a timely manner. Was chatting offline with [~kkambatl] and it came up that perhaps delayed preemption could be a more reasonable reaction in such cases. I was wondering what is your take on that? Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5871: --- Status: Patch Available (was: In Progress) Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5871.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-5871 started by Maysam Yabandeh. Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5871.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5871: --- Status: Open (was: Patch Available) Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5871.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989216#comment-13989216 ] Maysam Yabandeh commented on MAPREDUCE-5871: Seems like Hadoop QA is no longer run automatically! Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5871.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5871: --- Status: Patch Available (was: Open) Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5871.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5871: --- Status: Open (was: Patch Available) Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5871.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5871: --- Attachment: (was: YARN-1969.patch) Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5871.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh moved YARN-2006 to MAPREDUCE-5871: -- Key: MAPREDUCE-5871 (was: YARN-2006) Project: Hadoop Map/Reduce (was: Hadoop YARN) Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: YARN-1969.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5871: --- Attachment: MAPREDUCE-5871.patch Submitting the patch (MAPREDUCE-5871.patch) that resolves the issues raised by bugfinder. Estimate Job Endtime Key: MAPREDUCE-5871 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5871.patch, YARN-1969.patch YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As a prerequisite step, the AppMaster should estimate its end time and send it to the RM via the heartbeat. This jira focuses on how the AppMaster performs this estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973430#comment-13973430 ] Maysam Yabandeh commented on MAPREDUCE-5844: Thanks [~jlowe] for your detailed comment. # As I explained in the description of the jira the printed headroom in the logs is always zero. e.g., {code} org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_x: ask=8 release= 0 newContainers=0 finishedContainers=0 resourcelimit=memory:0, vCores:0 knownNMs=x {code} And this is not because there is no headroom (I know that by checking the available resources when job was running). # I actually was not surprised by headroom set always to zero since I found the the headroom field being abandoned in the source code of fairscheduler: in SchedulerApplicationAttempt#getHeadroom() is the one with which the headroom field in the response is set, while SchedulerApplicationAttempt#setHeadroom() is never invoked in FairScheduler (it is invoked in capacity and fifo scheduler though) # I assumed that not invoking setHeadroom in fair scheduler was intentional perhaps due to complications of computing the headroom when fair share is taken into account. But based on your comment, I understand that this could be a forgotten case rather than abandoned one. # At least in the observed case that we suffered from this problem, the headroom was available and both the preempted reducer and the mapper were assigned immediately (less than a few seconds). So, delaying the preemption even for a period as short as 1 minute could prevent this problem, while not having a tangible negative impact in cases that the preemption was actually required. I agree that there are tradeoffs with the this preemption delay (specially when it is high) but even a short value will suffice to cover this special case that the headroom is already available. # Weather we will have a fix for headroom calculation in fairschedualr or not, it seems to me that allowing the user to configure the preemption to be postponed for a short delay would not be hurtful, if it is not beneficial. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973572#comment-13973572 ] Maysam Yabandeh commented on MAPREDUCE-5844: Thanks [~jlowe] [~sandyr], [~tucu00], could you please comment on the plan for setting headroom in the fair scheduler's responses to apps? Or perhaps I am misreading the code and it is already there but not working! Should I open a jira for that? Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5448) MapFileOutputFormat#getReaders bug with invisible files/folders
Maysam Yabandeh created MAPREDUCE-5448: -- Summary: MapFileOutputFormat#getReaders bug with invisible files/folders Key: MAPREDUCE-5448 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5448 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Priority: Minor MapReduce jobs also produce some invisible files such as _SUCCESS, even when the output format is MapFileOutputFormat. MapFileOutputFormat#getReaders however reads the entire content of the job output, assming that they are MapFiles. {code} Path[] names = FileUtil.stat2Paths(fs.listStatus(dir)); {code} It should use a filter to skip the files that start with . or _. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5448) MapFileOutputFormat#getReaders bug with invisible files/folders
[ https://issues.apache.org/jira/browse/MAPREDUCE-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5448: --- Attachment: MAPREDUCE-5448.patch The attached path adds a filter to skip the files that start with . or _. It also updates the related unit test to show the problem. MapFileOutputFormat#getReaders bug with invisible files/folders --- Key: MAPREDUCE-5448 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5448 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Priority: Minor Attachments: MAPREDUCE-5448.patch Original Estimate: 1h Remaining Estimate: 1h MapReduce jobs also produce some invisible files such as _SUCCESS, even when the output format is MapFileOutputFormat. MapFileOutputFormat#getReaders however reads the entire content of the job output, assming that they are MapFiles. {code} Path[] names = FileUtil.stat2Paths(fs.listStatus(dir)); {code} It should use a filter to skip the files that start with . or _. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5267) History server should be more robust when cleaning old jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5267: --- Attachment: MAPREDUCE-5267.patch The attached patch fixes the problem and also adds a unit test to demonstrate that. History server should be more robust when cleaning old jobs --- Key: MAPREDUCE-5267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Jason Lowe Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5267.patch, MAPREDUCE-5267.patch Ran across a situation where an admin user had accidentally created a directory in one of the date directories under /mapred/history/done/ that was not readable by the historyserver user. That effectively prevented the history server from cleaning any jobs from that date forward, as it hit an IOException trying to scan the directory and that aborted the entire clean process. The history server should localize IOException handling to the directory/file being processed and move on to the next entry in the list rather than aborting the entire cleaning process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5267) History server should be more robust when cleaning old jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated MAPREDUCE-5267: --- Attachment: MAPREDUCE-5267.patch The attached patch puts two try/catch clause to allow the search in the clean method to continue despite the probable IO failures. I am working on a proper unit test. History server should be more robust when cleaning old jobs --- Key: MAPREDUCE-5267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Jason Lowe Attachments: MAPREDUCE-5267.patch Ran across a situation where an admin user had accidentally created a directory in one of the date directories under /mapred/history/done/ that was not readable by the historyserver user. That effectively prevented the history server from cleaning any jobs from that date forward, as it hit an IOException trying to scan the directory and that aborted the entire clean process. The history server should localize IOException handling to the directory/file being processed and move on to the next entry in the list rather than aborting the entire cleaning process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5267) History server should be more robust when cleaning old jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673993#comment-13673993 ] Maysam Yabandeh commented on MAPREDUCE-5267: I believe the particular bug reported in this JIRA is rooted in the implementation of listFiles. I attempted to reproduce the reported scenario by creating a directory DIR under /mapred/history/done/ with only root access. In my local machine, the current unit tests smoothly pass over the DIR by returning an empty list upon invocation of listFiles(). I guess this is not the case for hdfs, and similarly to what this jira reports, an exception will be raise (although i have not managed to run a unit test that exercise this). Nevertheless, I agree with you that this problem should be addressed at a higher level, since we do not know what is the next unpredictable scenario that raises an exception in the clean procedure. I would like to pick up this jira but I do not know how to write a unit test that exercise a method by raising (general) exceptions in the middle of it. History server should be more robust when cleaning old jobs --- Key: MAPREDUCE-5267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Jason Lowe Ran across a situation where an admin user had accidentally created a directory in one of the date directories under /mapred/history/done/ that was not readable by the historyserver user. That effectively prevented the history server from cleaning any jobs from that date forward, as it hit an IOException trying to scan the directory and that aborted the entire clean process. The history server should localize IOException handling to the directory/file being processed and move on to the next entry in the list rather than aborting the entire cleaning process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5267) History server should be more robust when cleaning old jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672184#comment-13672184 ] Maysam Yabandeh commented on MAPREDUCE-5267: One possible fix is to change FileContext.Util#listStatus(Path) to skip the files/directories for which it cannot access. {code:java} public FileStatus[] next(final AbstractFileSystem fs, final Path p) throws IOException, UnresolvedLinkException { return fs.listStatus(p); } {code} This would be consistent with the behavior of the local file system in RawLocalFileSystem#listStatus {code:java} File[] names = localf.listFiles(); {code} which returns only accessible items. Also, I was wondering if there is already a standard way of testing HistoryFileManager on top of hdfs. Currently, the tests in TestJobHistoryParsing.java are run on top of the local file system and hence do not reveal the kind of bugs reported in this jira. I made a first attempt of using MiniDFSCluster and setting its URI in remoteFS variable in conf, but it does not seem to be picked up by HistoryFileManager. History server should be more robust when cleaning old jobs --- Key: MAPREDUCE-5267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Jason Lowe Ran across a situation where an admin user had accidentally created a directory in one of the date directories under /mapred/history/done/ that was not readable by the historyserver user. That effectively prevented the history server from cleaning any jobs from that date forward, as it hit an IOException trying to scan the directory and that aborted the entire clean process. The history server should localize IOException handling to the directory/file being processed and move on to the next entry in the list rather than aborting the entire cleaning process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira