[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017641#comment-14017641 ] Remus Rusanu commented on MAPREDUCE-5196: - Hi [~curino], Can you shed some light on the rationale of this change: {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} This breaks Windows deployments as the local files get get routed through HDFS: {code} c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) {code} CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 3.0.0 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017761#comment-14017761 ] Carlo Curino commented on MAPREDUCE-5196: - Answering Wangda first: For preemption we had to chop a very large set of changes in patches which are being slowly pushed in (and churn in trunk made some this problematic). I think your summary is correct. This patch does only propagate the information from the AM to the Tasks. The tasks in turn only log the information. The actual checkpointing and release of resources is part of MAPREDUCE-5269. That patch used to work on trunk, but has now some issues, and Augusto Souza is looking to fix back into shape, if you have cycles to look at it, help on this is welcome. TaskStatus.State.PREEMPTED is set in MAPREDUCE-5269. Again this oddities are by-product of separating a very large chunk of changes in more digestible-sized patches. The idea of this patch is to fix the wiring so that the tasks knows about preemption, but not change the behavior quite yet. CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 3.0.0 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017771#comment-14017771 ] Carlo Curino commented on MAPREDUCE-5196: - Answering Remus: (I am not 100% sure, as I wrote this code over a year ago, but let me try to recall) As part of the preemption work we explored doing HDFS-based shuffling. The benefits of this were: 1) performance enhancements on certain data size ranges (stream-merge on the reducers) 2) the reducer checkpoint state was much smaller (no data, just offset of the last read key from each map) That was an initial sperimentation, but making it robust was non-trivial (missing mapoutput were hard to recover) so we didn't push it yet. In that context, the mapOutput was not on localFS but on HDFS, and the change you pointed out was fixing that. But this clearly does not work for windows. My guess is that reverting that part should be fine here. CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 3.0.0 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
Remus Rusanu created MAPREDUCE-5912: --- Summary: Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 Key: MAPREDUCE-5912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017779#comment-14017779 ] Remus Rusanu commented on MAPREDUCE-5196: - MAPREDUCE-5912 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 3.0.0 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
[ https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated MAPREDUCE-5912: Description: {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} causes Windows local output files to be routed through HDFS: {code} 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) at org.apache.hadoop.mapred.Task.done(Task.java:1048) {code} Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 --- Key: MAPREDUCE-5912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} causes Windows local output files to be routed through HDFS: {code} 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) at org.apache.hadoop.mapred.Task.done(Task.java:1048) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1401#comment-1401 ] Remus Rusanu commented on MAPREDUCE-5196: - Thanks [~curino]. I will open an issue and upload a patch shortly. CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 3.0.0 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
[ https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated MAPREDUCE-5912: Fix Version/s: 3.0.0 Status: Patch Available (was: Open) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 --- Key: MAPREDUCE-5912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Fix For: 3.0.0 Attachments: MAPREDUCE-5912.1.patch {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} causes Windows local output files to be routed through HDFS: {code} 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) at org.apache.hadoop.mapred.Task.done(Task.java:1048) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
[ https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated MAPREDUCE-5912: Attachment: MAPREDUCE-5912.1.patch Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 --- Key: MAPREDUCE-5912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Fix For: 3.0.0 Attachments: MAPREDUCE-5912.1.patch {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} causes Windows local output files to be routed through HDFS: {code} 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) at org.apache.hadoop.mapred.Task.done(Task.java:1048) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
[ https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017828#comment-14017828 ] Hadoop QA commented on MAPREDUCE-5912: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648337/MAPREDUCE-5912.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4642//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4642//console This message is automatically generated. Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 --- Key: MAPREDUCE-5912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Fix For: 3.0.0 Attachments: MAPREDUCE-5912.1.patch {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} causes Windows local output files to be routed through HDFS: {code} 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) at org.apache.hadoop.mapred.Task.done(Task.java:1048) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
[ https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017852#comment-14017852 ] Remus Rusanu commented on MAPREDUCE-5912: - No new tests included because this is a revert of an earlier breaking change. Manually validated the change on Windows. Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 --- Key: MAPREDUCE-5912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Fix For: 3.0.0 Attachments: MAPREDUCE-5912.1.patch {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} causes Windows local output files to be routed through HDFS: {code} 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) at org.apache.hadoop.mapred.Task.done(Task.java:1048) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018092#comment-14018092 ] Karthik Kambatla commented on MAPREDUCE-5777: - skipByteOrderMark seems to be duplicated. Can we have a single implementation of it and reuse it at the other place? Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: MAPREDUCE-5777.000.patch, MAPREDUCE-5777.001.patch, MAPREDUCE-5777.002.patch, MAPREDUCE-5777.003.patch, MAPREDUCE-5777.004.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018184#comment-14018184 ] zhihai xu commented on MAPREDUCE-5777: -- Hi Karthik, Thanks for the comment. It look like all other methods in LineRecordReader are also duplicated between MapRed(old API) and MapReduce(new API), Can we create another JIRA to handle the duplication between MapRed(old API) and MapReduce(new API)? thanks zhihai Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: MAPREDUCE-5777.000.patch, MAPREDUCE-5777.001.patch, MAPREDUCE-5777.002.patch, MAPREDUCE-5777.003.patch, MAPREDUCE-5777.004.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5913) Unify MapRed(old API) and MapReduce(new API) implementation to remove duplicate functions.
zhihai xu created MAPREDUCE-5913: Summary: Unify MapRed(old API) and MapReduce(new API) implementation to remove duplicate functions. Key: MAPREDUCE-5913 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5913 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: zhihai xu Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018194#comment-14018194 ] zhihai xu commented on MAPREDUCE-5777: -- I just create JIRA at https://issues.apache.org/jira/browse/MAPREDUCE-5913 to address this duplication issue between MapRed and MapReduce. Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: MAPREDUCE-5777.000.patch, MAPREDUCE-5777.001.patch, MAPREDUCE-5777.002.patch, MAPREDUCE-5777.003.patch, MAPREDUCE-5777.004.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
[ https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018246#comment-14018246 ] Chris Douglas commented on MAPREDUCE-5912: -- As you identified in HADOOP-10663, returning the default filesystem for local paths is not correct. Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 --- Key: MAPREDUCE-5912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Fix For: 3.0.0 Attachments: MAPREDUCE-5912.1.patch {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} causes Windows local output files to be routed through HDFS: {code} 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) at org.apache.hadoop.mapred.Task.done(Task.java:1048) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5913) Unify MapRed(old API) and MapReduce(new API) implementation to remove duplicate functions.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5913: - Description: Unify MapRed(old API) and MapReduce(new API) implementation to remove duplicate functions. For example, org.apache.hadoop.mapred.LineRecordReader and org.apache.hadoop.mapreduce.lib.input.LineRecordReader have many duplicate functions. Unify MapRed(old API) and MapReduce(new API) implementation to remove duplicate functions. -- Key: MAPREDUCE-5913 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5913 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: zhihai xu Priority: Minor Unify MapRed(old API) and MapReduce(new API) implementation to remove duplicate functions. For example, org.apache.hadoop.mapred.LineRecordReader and org.apache.hadoop.mapreduce.lib.input.LineRecordReader have many duplicate functions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018357#comment-14018357 ] Wangda Tan commented on MAPREDUCE-5196: --- Hi [~curino], Thanks for your clarifications on my question, it's clear to me now. For the MAPREDUCE-5269, please feel free to let me know if I can help with review. Wangda CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 3.0.0 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.2#6252)