[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2014-06-04 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017641#comment-14017641
 ] 

Remus Rusanu commented on MAPREDUCE-5196:
-

Hi [~curino],

Can you shed some light on the rationale of this change:
{code}
@@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
 if (isMapTask()  conf.getNumReduceTasks()  0) {
   try {
 Path mapOutput =  mapOutputFile.getOutputFile();
-FileSystem localFS = FileSystem.getLocal(conf);
-return localFS.getFileStatus(mapOutput).getLen();
+FileSystem fs = mapOutput.getFileSystem(conf);
+return fs.getFileStatus(mapOutput).getLen();
   } catch (IOException e) {
 LOG.warn (Could not find output size  , e);
   }
{code}
This breaks Windows deployments as the local files get get routed through HDFS:
{code}
c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
 is not a valid DFS filename.
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
   at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
   at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
   at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
{code}





 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2014-06-04 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017761#comment-14017761
 ] 

Carlo Curino commented on MAPREDUCE-5196:
-

Answering Wangda first:

For preemption we had to chop a very large set of changes in patches which are 
being slowly pushed in (and churn in trunk made some this problematic).

I think your summary is correct. This patch does only propagate the information 
from the AM to the Tasks. The tasks in turn only log the information. 
The actual checkpointing and release of resources is part of MAPREDUCE-5269. 
That patch used to work on trunk, but has now some issues, 
and Augusto Souza is looking to fix back into shape, if you have cycles to look 
at it, help on this is welcome. 

TaskStatus.State.PREEMPTED is set in MAPREDUCE-5269. Again this oddities are 
by-product of separating a very large chunk of changes in more 
digestible-sized patches.

The idea of this patch is to fix the wiring so that the tasks knows about 
preemption, but not change the behavior quite yet.



 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2014-06-04 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017771#comment-14017771
 ] 

Carlo Curino commented on MAPREDUCE-5196:
-

Answering Remus:

(I am not 100% sure, as I wrote this code over a year ago, but let me try to 
recall) 
As part of the preemption work we explored doing HDFS-based shuffling. 
The benefits of this were:
1) performance enhancements on certain data size ranges (stream-merge on the 
reducers)
2) the reducer checkpoint state was much smaller (no data, just offset of the 
last read key from each map)

That was an initial sperimentation, but making it robust was non-trivial 
(missing mapoutput were hard to 
recover) so we didn't push it yet. In that context, the mapOutput was not on 
localFS but on HDFS, and 
the change you pointed out was fixing that. But this clearly does not work for 
windows. My guess is that
reverting that part should be fine here. 



 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-04 Thread Remus Rusanu (JIRA)
Remus Rusanu created MAPREDUCE-5912:
---

 Summary: Task.calculateOutputSize does not handle Windows files 
after MAPREDUCE-5196
 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2014-06-04 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017779#comment-14017779
 ] 

Remus Rusanu commented on MAPREDUCE-5196:
-

MAPREDUCE-5912

 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-04 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated MAPREDUCE-5912:


Description: 
{code}
@@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
 if (isMapTask()  conf.getNumReduceTasks()  0) {
   try {
 Path mapOutput =  mapOutputFile.getOutputFile();
-FileSystem localFS = FileSystem.getLocal(conf);
-return localFS.getFileStatus(mapOutput).getLen();
+FileSystem fs = mapOutput.getFileSystem(conf);
+return fs.getFileStatus(mapOutput).getLen();
   } catch (IOException e) {
 LOG.warn (Could not find output size  , e);
   }
{code}

causes Windows local output files to be routed through HDFS:

{code}
2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.IllegalArgumentException: Pathname 
/c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
 from 
c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
 is not a valid DFS filename.
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
   at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
   at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
   at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
   at org.apache.hadoop.mapred.Task.done(Task.java:1048)
{code}


 Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
 ---

 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu

 {code}
 @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
  if (isMapTask()  conf.getNumReduceTasks()  0) {
try {
  Path mapOutput =  mapOutputFile.getOutputFile();
 -FileSystem localFS = FileSystem.getLocal(conf);
 -return localFS.getFileStatus(mapOutput).getLen();
 +FileSystem fs = mapOutput.getFileSystem(conf);
 +return fs.getFileStatus(mapOutput).getLen();
} catch (IOException e) {
  LOG.warn (Could not find output size  , e);
}
 {code}
 causes Windows local output files to be routed through HDFS:
 {code}
 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.IllegalArgumentException: Pathname 
 /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  from 
 c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  is not a valid DFS filename.
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
at org.apache.hadoop.mapred.Task.done(Task.java:1048)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2014-06-04 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1401#comment-1401
 ] 

Remus Rusanu commented on MAPREDUCE-5196:
-

Thanks [~curino]. I will open an issue and upload a patch shortly.

 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-04 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated MAPREDUCE-5912:


Fix Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
 ---

 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5912.1.patch


 {code}
 @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
  if (isMapTask()  conf.getNumReduceTasks()  0) {
try {
  Path mapOutput =  mapOutputFile.getOutputFile();
 -FileSystem localFS = FileSystem.getLocal(conf);
 -return localFS.getFileStatus(mapOutput).getLen();
 +FileSystem fs = mapOutput.getFileSystem(conf);
 +return fs.getFileStatus(mapOutput).getLen();
} catch (IOException e) {
  LOG.warn (Could not find output size  , e);
}
 {code}
 causes Windows local output files to be routed through HDFS:
 {code}
 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.IllegalArgumentException: Pathname 
 /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  from 
 c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  is not a valid DFS filename.
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
at org.apache.hadoop.mapred.Task.done(Task.java:1048)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-04 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated MAPREDUCE-5912:


Attachment: MAPREDUCE-5912.1.patch

 Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
 ---

 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5912.1.patch


 {code}
 @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
  if (isMapTask()  conf.getNumReduceTasks()  0) {
try {
  Path mapOutput =  mapOutputFile.getOutputFile();
 -FileSystem localFS = FileSystem.getLocal(conf);
 -return localFS.getFileStatus(mapOutput).getLen();
 +FileSystem fs = mapOutput.getFileSystem(conf);
 +return fs.getFileStatus(mapOutput).getLen();
} catch (IOException e) {
  LOG.warn (Could not find output size  , e);
}
 {code}
 causes Windows local output files to be routed through HDFS:
 {code}
 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.IllegalArgumentException: Pathname 
 /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  from 
 c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  is not a valid DFS filename.
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
at org.apache.hadoop.mapred.Task.done(Task.java:1048)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017828#comment-14017828
 ] 

Hadoop QA commented on MAPREDUCE-5912:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12648337/MAPREDUCE-5912.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4642//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4642//console

This message is automatically generated.

 Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
 ---

 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5912.1.patch


 {code}
 @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
  if (isMapTask()  conf.getNumReduceTasks()  0) {
try {
  Path mapOutput =  mapOutputFile.getOutputFile();
 -FileSystem localFS = FileSystem.getLocal(conf);
 -return localFS.getFileStatus(mapOutput).getLen();
 +FileSystem fs = mapOutput.getFileSystem(conf);
 +return fs.getFileStatus(mapOutput).getLen();
} catch (IOException e) {
  LOG.warn (Could not find output size  , e);
}
 {code}
 causes Windows local output files to be routed through HDFS:
 {code}
 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.IllegalArgumentException: Pathname 
 /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  from 
 c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  is not a valid DFS filename.
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
at org.apache.hadoop.mapred.Task.done(Task.java:1048)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-04 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017852#comment-14017852
 ] 

Remus Rusanu commented on MAPREDUCE-5912:
-

No new tests included because this is a revert of an earlier breaking change. 
Manually validated the change on Windows.

 Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
 ---

 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5912.1.patch


 {code}
 @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
  if (isMapTask()  conf.getNumReduceTasks()  0) {
try {
  Path mapOutput =  mapOutputFile.getOutputFile();
 -FileSystem localFS = FileSystem.getLocal(conf);
 -return localFS.getFileStatus(mapOutput).getLen();
 +FileSystem fs = mapOutput.getFileSystem(conf);
 +return fs.getFileStatus(mapOutput).getLen();
} catch (IOException e) {
  LOG.warn (Could not find output size  , e);
}
 {code}
 causes Windows local output files to be routed through HDFS:
 {code}
 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.IllegalArgumentException: Pathname 
 /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  from 
 c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  is not a valid DFS filename.
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
at org.apache.hadoop.mapred.Task.done(Task.java:1048)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-06-04 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018092#comment-14018092
 ] 

Karthik Kambatla commented on MAPREDUCE-5777:
-

skipByteOrderMark seems to be duplicated. Can we have a single implementation 
of it and reuse it at the other place? 

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: MAPREDUCE-5777.000.patch, MAPREDUCE-5777.001.patch, 
 MAPREDUCE-5777.002.patch, MAPREDUCE-5777.003.patch, MAPREDUCE-5777.004.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018184#comment-14018184
 ] 

zhihai xu commented on MAPREDUCE-5777:
--

Hi Karthik,
Thanks for the comment. It look like all other methods in LineRecordReader are 
also duplicated between MapRed(old API) and MapReduce(new API), Can we create 
another JIRA to handle the duplication between MapRed(old API) and 
MapReduce(new API)?
thanks
zhihai

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: MAPREDUCE-5777.000.patch, MAPREDUCE-5777.001.patch, 
 MAPREDUCE-5777.002.patch, MAPREDUCE-5777.003.patch, MAPREDUCE-5777.004.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5913) Unify MapRed(old API) and MapReduce(new API) implementation to remove duplicate functions.

2014-06-04 Thread zhihai xu (JIRA)
zhihai xu created MAPREDUCE-5913:


 Summary: Unify MapRed(old API) and MapReduce(new API) 
implementation to remove duplicate functions.
 Key: MAPREDUCE-5913
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5913
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: zhihai xu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018194#comment-14018194
 ] 

zhihai xu commented on MAPREDUCE-5777:
--

I just create JIRA at https://issues.apache.org/jira/browse/MAPREDUCE-5913 to 
address this  duplication  issue between MapRed and MapReduce.

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: MAPREDUCE-5777.000.patch, MAPREDUCE-5777.001.patch, 
 MAPREDUCE-5777.002.patch, MAPREDUCE-5777.003.patch, MAPREDUCE-5777.004.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-04 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018246#comment-14018246
 ] 

Chris Douglas commented on MAPREDUCE-5912:
--

As you identified in HADOOP-10663, returning the default filesystem for local 
paths is not correct.

 Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
 ---

 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5912.1.patch


 {code}
 @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
  if (isMapTask()  conf.getNumReduceTasks()  0) {
try {
  Path mapOutput =  mapOutputFile.getOutputFile();
 -FileSystem localFS = FileSystem.getLocal(conf);
 -return localFS.getFileStatus(mapOutput).getLen();
 +FileSystem fs = mapOutput.getFileSystem(conf);
 +return fs.getFileStatus(mapOutput).getLen();
} catch (IOException e) {
  LOG.warn (Could not find output size  , e);
}
 {code}
 causes Windows local output files to be routed through HDFS:
 {code}
 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.IllegalArgumentException: Pathname 
 /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  from 
 c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  is not a valid DFS filename.
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
at org.apache.hadoop.mapred.Task.done(Task.java:1048)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5913) Unify MapRed(old API) and MapReduce(new API) implementation to remove duplicate functions.

2014-06-04 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5913:
-

Description: 
Unify MapRed(old API) and MapReduce(new API) implementation to remove duplicate 
functions.
For example, org.apache.hadoop.mapred.LineRecordReader and 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader have many duplicate 
functions.

 Unify MapRed(old API) and MapReduce(new API) implementation to remove 
 duplicate functions.
 --

 Key: MAPREDUCE-5913
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5913
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: zhihai xu
Priority: Minor

 Unify MapRed(old API) and MapReduce(new API) implementation to remove 
 duplicate functions.
 For example, org.apache.hadoop.mapred.LineRecordReader and 
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader have many duplicate 
 functions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2014-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018357#comment-14018357
 ] 

Wangda Tan commented on MAPREDUCE-5196:
---

Hi [~curino],
Thanks for your clarifications on my question, it's clear to me now.
For the MAPREDUCE-5269, please feel free to let me know if I can help with 
review.

Wangda


 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)