[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-05 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018694#comment-14018694
 ] 

Remus Rusanu commented on MAPREDUCE-5912:
-

I also posted a patch that solves HADOOP-10663. I guess if that is accepted, 
this is obsolete.

 Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
 ---

 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5912.1.patch


 {code}
 @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
  if (isMapTask()  conf.getNumReduceTasks()  0) {
try {
  Path mapOutput =  mapOutputFile.getOutputFile();
 -FileSystem localFS = FileSystem.getLocal(conf);
 -return localFS.getFileStatus(mapOutput).getLen();
 +FileSystem fs = mapOutput.getFileSystem(conf);
 +return fs.getFileStatus(mapOutput).getLen();
} catch (IOException e) {
  LOG.warn (Could not find output size  , e);
}
 {code}
 causes Windows local output files to be routed through HDFS:
 {code}
 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.IllegalArgumentException: Pathname 
 /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  from 
 c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  is not a valid DFS filename.
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
at org.apache.hadoop.mapred.Task.done(Task.java:1048)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5914) Writables are not configured by framework

2014-06-05 Thread Abraham Elmahrek (JIRA)
Abraham Elmahrek created MAPREDUCE-5914:
---

 Summary: Writables are not configured by framework
 Key: MAPREDUCE-5914
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5914
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Abraham Elmahrek


Seeing the following exception:
{noformat}
java.lang.Exception: java.lang.NullPointerException
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.NullPointerException
at 
org.apache.sqoop.job.io.SqoopWritable.readFields(SqoopWritable.java:59)
at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:129)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1248)
at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:35)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:87)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1582)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1467)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:769)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{noformat}

It turns out that WritableComparator does not configure Writable objects 
:https://github.com/apache/hadoop-common/blob/branch-2.3.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/WritableComparator.java.
 This is during the sort phase for an MR job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-1362) Pipes should be ported to the new mapreduce API

2014-06-05 Thread Joe Mudd (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Mudd updated MAPREDUCE-1362:


Attachment: MAPREDUCE-1362.patch

Resync'd trunk patch

 Pipes should be ported to the new mapreduce API
 ---

 Key: MAPREDUCE-1362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1362
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: pipes
Reporter: Bassam Tabbara
 Attachments: MAPREDUCE-1362-trunk.patch, MAPREDUCE-1362.patch, 
 MAPREDUCE-1362.patch


 Pipes is still currently using the old mapred API. This prevents us from 
 using pipes with HBase's TableInputFormat, HRegionPartitioner, etc. 
 Here is a rough proposal for how to accomplish this:
 * Add a new package org.apache.hadoop.mapreduce.pipes that uses the new 
 mapred API.
 * the new pipes package will run side by side with the old one. old one 
 should get deprecated at some point.
 * the wire protocol used between PipesMapper and PipesReducer and C++ 
 programs must not change.
 * bin/hadoop should support both pipes (old api) and pipes2 (new api)
 Does this sound reasonable?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-1362) Pipes should be ported to the new mapreduce API

2014-06-05 Thread Joe Mudd (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019168#comment-14019168
 ] 

Joe Mudd commented on MAPREDUCE-1362:
-

I've rebuilt the patch against the latest trunk.  The latest 
MAPREDUCE-1362.patch is ready for a code review.

 Pipes should be ported to the new mapreduce API
 ---

 Key: MAPREDUCE-1362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1362
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: pipes
Reporter: Bassam Tabbara
 Attachments: MAPREDUCE-1362-trunk.patch, MAPREDUCE-1362.patch, 
 MAPREDUCE-1362.patch


 Pipes is still currently using the old mapred API. This prevents us from 
 using pipes with HBase's TableInputFormat, HRegionPartitioner, etc. 
 Here is a rough proposal for how to accomplish this:
 * Add a new package org.apache.hadoop.mapreduce.pipes that uses the new 
 mapred API.
 * the new pipes package will run side by side with the old one. old one 
 should get deprecated at some point.
 * the wire protocol used between PipesMapper and PipesReducer and C++ 
 programs must not change.
 * bin/hadoop should support both pipes (old api) and pipes2 (new api)
 Does this sound reasonable?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing

2014-06-05 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019332#comment-14019332
 ] 

Sumit Kumar commented on MAPREDUCE-5907:


[~ste...@apache.org] Seeking your attention to this JIRA. Thanks!

 Improve getSplits() performance for fs implementations that can utilize 
 performance gains from recursive listing
 

 Key: MAPREDUCE-5907
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.4.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
 Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, 
 MAPREDUCE-5907.patch


 FileInputFormat (both mapreduce and mapred implementations) use recursive 
 listing while calculating splits. They however do this by doing listing level 
 by level. That means to discover files in /foo/bar means they do listing at 
 /foo/bar first to get the immediate children, then make the same call on all 
 immediate children for /foo/bar to discover their immediate children and so 
 on. This doesn't scale well for object store based fs implementations like s3 
 and swift because every listStatus call ends up being a webservice call to 
 backend. In cases where large number of files are considered for input, this 
 makes getSplits() call slow. 
 This patch adds a new set of recursive list apis that gives opportunity to 
 the fs implementations to optimize. The behavior remains the same for other 
 implementations (that is a default implementation is provided for other fs so 
 they don't have to implement anything new). However for objectstore based fs 
 implementations it provides a simple change to include recursive flag as true 
 (as shown in the patch) to improve listing performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)