[jira] [Created] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir

2013-12-02 Thread Eric Sirianni (JIRA)
Eric Sirianni created MAPREDUCE-5661:


 Summary: ShuffleHandler using yarn.nodemanager.local-dirs instead 
of mapreduce.cluster.local.dir
 Key: MAPREDUCE-5661
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Eric Sirianni
Priority: Trivial


While debugging an issue where a MapReduce job is failing due to running out of 
disk space, I noticed that the {{ShuffleHandler}} uses 
{{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of 
the other MapReduce classes use {{mapreduce.cluster.local.dir}}:

{noformat}
$ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ -name 
*.java | xargs grep new LocalDirAllocator(
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java:
LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java:
new LocalDirAllocator(MRConfig.LOCAL_DIR);
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java:
  new LocalDirAllocator(MRConfig.LOCAL_DIR);
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java:
  this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java:
new LocalDirAllocator(MRConfig.LOCAL_DIR);
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java:
new LocalDirAllocator(MRConfig.LOCAL_DIR);
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java:
this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);

*hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java:
  new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS);
{noformat}

This inconsistency feels like something that is likely to confuse admins.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits

2013-12-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836600#comment-13836600
 ] 

Jason Lowe commented on MAPREDUCE-5656:
---

That case is what the {{finished}} flag in CompressedSplitLineReader is 
intended to catch.  Here's the scenario:

# LineRecordReader calls readLine
# The line processing causes us to fetch the next compressed block beyond the 
split (i.e.: fillBuffer is called).  Let's say this causes us to set 
needAdditionalRecord=true.
# LineRecordReader will process another iteration of the loop and call readLine 
again
# readLine will notice that we are starting at a position past the end of the 
split and set finished=true.
# At that point the needAdditionalRecordAfterSplit method will always return 
false and LineRecordReader should not read more than at most one record beyond 
the end of the split.

The key is needAdditionalRecordAfterSplit() will always return false once 
readLine() is invoked at a position after the split ends.

 bzip2 codec can drop records when reading data in splits
 

 Key: MAPREDUCE-5656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.4-alpha, 0.23.8
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, 
 HADOOP-9622.patch, MAPREDUCE-5656.patch, blockEndingInCR.txt.bz2, 
 blockEndingInCRThenLF.txt.bz2


 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when 
 reading them in splits based on where record delimiters occur relative to 
 compression block boundaries.
 Thanks to [~knoguchi] for discovering this problem while working on PIG-3251.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits

2013-12-02 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5656:
--

Attachment: MAPREDUCE-5656-2.patch

Slightly updated patch to fix the spacing issue in SplitLineReader.

 bzip2 codec can drop records when reading data in splits
 

 Key: MAPREDUCE-5656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.4-alpha, 0.23.8
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, 
 HADOOP-9622.patch, MAPREDUCE-5656-2.patch, MAPREDUCE-5656.patch, 
 blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2


 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when 
 reading them in splits based on where record delimiters occur relative to 
 compression block boundaries.
 Thanks to [~knoguchi] for discovering this problem while working on PIG-3251.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits

2013-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836644#comment-13836644
 ] 

Hadoop QA commented on MAPREDUCE-5656:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12616562/MAPREDUCE-5656-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

  org.apache.hadoop.mapred.TestLineRecordReader
  org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4235//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4235//console

This message is automatically generated.

 bzip2 codec can drop records when reading data in splits
 

 Key: MAPREDUCE-5656
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.4-alpha, 0.23.8
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, 
 HADOOP-9622.patch, MAPREDUCE-5656-2.patch, MAPREDUCE-5656.patch, 
 blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2


 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when 
 reading them in splits based on where record delimiters occur relative to 
 compression block boundaries.
 Thanks to [~knoguchi] for discovering this problem while working on PIG-3251.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir

2013-12-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836652#comment-13836652
 ] 

Jason Lowe commented on MAPREDUCE-5661:
---

Note that YarnChild.configureLocalDirs sets this property based on an 
environment variable, which is itself derived from yarn.nodemanager.local-dirs, 
and therefore most of the other references are really coming from what was 
specified in yarn.nodemanager.local-dirs and not what was configured by an 
admin.  The notable exception would be jobs run in local mode.

Also note that the shuffle handler is a bit special in that it is the one piece 
of MapReduce code that runs as part of the YARN nodemanager process and not as 
part of a job or a client.  It is more likely yarn.nodemanager.local-dirs is 
configured on a particular YARN node than mapreduce.cluster.local.dir, so I 
think it's appropriate that variable is used in the shuffle handler case.  I 
don't think mapreduce.cluster.local.dir is even set on some of our clusters, as 
the MapReduce framework configures this variable for tasks when running under 
YARN.  I wouldn't expect it to have to be configured by admins at all unless 
supporting jobs in local mode and for some reason the default isn't sufficient.

 ShuffleHandler using yarn.nodemanager.local-dirs instead of 
 mapreduce.cluster.local.dir
 ---

 Key: MAPREDUCE-5661
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Eric Sirianni
Priority: Trivial

 While debugging an issue where a MapReduce job is failing due to running out 
 of disk space, I noticed that the {{ShuffleHandler}} uses 
 {{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of 
 the other MapReduce classes use {{mapreduce.cluster.local.dir}}:
 {noformat}
 $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ 
 -name *.java | xargs grep new LocalDirAllocator(
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java:
 LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java:
   this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java:
 this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 *hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java:
   new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS);
 {noformat}
 This inconsistency feels like something that is likely to confuse admins.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir

2013-12-02 Thread Eric Sirianni (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836758#comment-13836758
 ] 

Eric Sirianni commented on MAPREDUCE-5661:
--

The description for {{mapreduce.cluster.local.dir}} implies that that directory 
will receive significant load:

{code:xml}
property
  namemapreduce.cluster.local.dir/name
  value${hadoop.tmp.dir}/mapred/local/value
  description
  The local directory where MapReduce stores intermediate
  data files.  May be a comma-separated list of
  directories on different devices in order to spread disk i/o.
  Directories that do not exist are ignored.
  /description
/property
{code}

Since you are suggesting that the default (typically in /tmp) is sufficient, 
perhaps that description should be altered?  I'm observing that the shuffle is 
creating the majority of the disk I/O in my MapReduce jobs, which is using the 
{{yarn.nodemanager.local-dirs}}.

 ShuffleHandler using yarn.nodemanager.local-dirs instead of 
 mapreduce.cluster.local.dir
 ---

 Key: MAPREDUCE-5661
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Eric Sirianni
Priority: Trivial

 While debugging an issue where a MapReduce job is failing due to running out 
 of disk space, I noticed that the {{ShuffleHandler}} uses 
 {{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of 
 the other MapReduce classes use {{mapreduce.cluster.local.dir}}:
 {noformat}
 $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ 
 -name *.java | xargs grep new LocalDirAllocator(
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java:
 LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java:
   this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java:
 this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 *hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java:
   new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS);
 {noformat}
 This inconsistency feels like something that is likely to confuse admins.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (MAPREDUCE-5655) Remote job submit from windows to a linux hadoop cluster fails due to wrong classpath

2013-12-02 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5655.
--

Resolution: Duplicate

Hi, [~padisah].  Thanks for the bug report.

I do think this is a duplicate of MAPREDUCE-4052.  The 0.23.x code line is 
similar to the 2.2.x code line.  It's often the case that a bug in 2.2.x is 
also a bug in 0.23.x.  I've just updated MAPREDUCE-4052 to make the title 
clearer and indicate that it also affects version 2.2.0.

I recommend that your participate on MAPREDUCE-4052.  There is a patch attached 
to that issue, but it's a few months old, so it's likely to be out-of-date at 
this point.  Seeing your latest patch would be valuable.  You can upload your 
patch by clicking the More button at the top and then going through the Attach 
Files dialog.  The Submit Patch button is used to submit your patch to Jenkins 
for a test run against current trunk.

Thanks again!


 Remote job submit from windows to a linux hadoop cluster fails due to wrong 
 classpath
 -

 Key: MAPREDUCE-5655
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5655
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, job submission
Affects Versions: 2.2.0
 Environment: Client machine is a Windows 7 box, with Eclipse
 Remote: there is a multi node hadoop cluster, installed on Ubuntu boxes (any 
 linux)
Reporter: Attila Pados

 I was trying to run a java class on my client, windows 7 developer 
 environment, which submits a job to the remote Hadoop cluster, initiates a 
 mapreduce there, and then downloads the results back to the local machine.
 General use case is to use hadoop services from a web application installed 
 on a non-cluster computer, or as part of a developer environment.
 The problem was, that the ApplicationMaster's startup shell script 
 (launch_container.sh) was generated with wrong CLASSPATH entry. Together with 
 the java process call on the bottom of the file, these entries were generated 
 in windows style, using % as shell variable marker and ; as the CLASSPATH 
 delimiter.
 I tracked down the root cause, and found that the MrApps.java, and the 
 YarnRunner.java classes create these entries, and is passed forward to the 
 ApplicationMaster, assuming that the OS that runs these classes will match 
 the one running the ApplicationMaster. But it's not the case, these are in 2 
 different jvm, and also the OS can be different, the strings are generated 
 based on the client/submitter side's OS.
 I made some workaround changes to these 2 files, so i could launch my job, 
 however there may be more problems ahead.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2013-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836766#comment-13836766
 ] 

Hadoop QA commented on MAPREDUCE-4052:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12523800/MAPREDUCE-4052-0.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4236//console

This message is automatically generated.

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 0.23.1, 2.2.0
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: xieguiming
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir

2013-12-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836779#comment-13836779
 ] 

Jason Lowe commented on MAPREDUCE-5661:
---

Yes, in a YARN cluster the majority of the I/O on a node for MapReduce should 
be in the yarn.nodemanager.local-dirs directories.  Those settings are 
replicated to mapreduce.cluster.local.dir during task startup under YARN, so 
admins don't normally have to configure mapreduce.cluster.local.dir.  I would 
expect an explicit setting of mapreduce.cluster.local.dir to only take effect 
when running a job in local mode, which usually isn't a very big job and 
therefore the default of somewhere under /tmp is probably fine for most of 
those cases.

So to sum up, tasks are using mapreduce.cluster.local.dir but the directories 
listed there are derived from yarn.nodemanager.local-dirs in a YARN cluster.  
Setting mapreduce.cluster.local.dir in mapred-site.xml would have no effect for 
most MapReduce jobs in a YARN cluster.

 ShuffleHandler using yarn.nodemanager.local-dirs instead of 
 mapreduce.cluster.local.dir
 ---

 Key: MAPREDUCE-5661
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Eric Sirianni
Priority: Trivial

 While debugging an issue where a MapReduce job is failing due to running out 
 of disk space, I noticed that the {{ShuffleHandler}} uses 
 {{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of 
 the other MapReduce classes use {{mapreduce.cluster.local.dir}}:
 {noformat}
 $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ 
 -name *.java | xargs grep new LocalDirAllocator(
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java:
 LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java:
   this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java:
 this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 *hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java:
   new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS);
 {noformat}
 This inconsistency feels like something that is likely to confuse admins.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir

2013-12-02 Thread Eric Sirianni (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836792#comment-13836792
 ] 

Eric Sirianni commented on MAPREDUCE-5661:
--

{quote}
Note that YarnChild.configureLocalDirs sets this property based on an 
environment variable
{quote}

Ah.  I see now.  I didn't realize that this property meant 
{{mapreduce.cluster.local.dir}}.  Got it, thanks.

 ShuffleHandler using yarn.nodemanager.local-dirs instead of 
 mapreduce.cluster.local.dir
 ---

 Key: MAPREDUCE-5661
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Eric Sirianni
Priority: Trivial

 While debugging an issue where a MapReduce job is failing due to running out 
 of disk space, I noticed that the {{ShuffleHandler}} uses 
 {{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of 
 the other MapReduce classes use {{mapreduce.cluster.local.dir}}:
 {noformat}
 $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ 
 -name *.java | xargs grep new LocalDirAllocator(
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java:
 LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java:
   new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java:
   this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java:
 new LocalDirAllocator(MRConfig.LOCAL_DIR);
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java:
 this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR);
 *hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java:
   new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS);
 {noformat}
 This inconsistency feels like something that is likely to confuse admins.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5640) Rename TestLineRecordReader in jobclient module

2013-12-02 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836800#comment-13836800
 ] 

Jonathan Eagles commented on MAPREDUCE-5640:


+1. Changes look good, Jason. Checking this in shortly.

 Rename TestLineRecordReader in jobclient module
 ---

 Key: MAPREDUCE-5640
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5640
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: test
Affects Versions: 0.23.9, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Trivial
 Attachments: MAPREDUCE-5640.patch


 HADOOP-9622 proposes to add new unit tests for LineRecordReader in the 
 mapreduce-client-core module alongside the code.  The existing 
 LineRecordReader tests in the mapreduce-client-jobclient module should be 
 renamed to something like TestLineRecordReaderJobs to avoid a name conflict 
 and to better indicate these are integration tests using full jobs rather 
 than unit tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5640) Rename TestLineRecordReader in jobclient module

2013-12-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836810#comment-13836810
 ] 

Hudson commented on MAPREDUCE-5640:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #4814 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4814/])
MAPREDUCE-5640. Rename TestLineRecordReader in jobclient module (Jason Lowe via 
jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1547149)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestLineRecordReaderJobs.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReaderJobs.java


 Rename TestLineRecordReader in jobclient module
 ---

 Key: MAPREDUCE-5640
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5640
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: test
Affects Versions: 0.23.9, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Trivial
 Attachments: MAPREDUCE-5640.patch


 HADOOP-9622 proposes to add new unit tests for LineRecordReader in the 
 mapreduce-client-core module alongside the code.  The existing 
 LineRecordReader tests in the mapreduce-client-jobclient module should be 
 renamed to something like TestLineRecordReaderJobs to avoid a name conflict 
 and to better indicate these are integration tests using full jobs rather 
 than unit tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5640) Rename TestLineRecordReader in jobclient module

2013-12-02 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated MAPREDUCE-5640:
---

   Resolution: Fixed
Fix Version/s: 0.23.10
   2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

 Rename TestLineRecordReader in jobclient module
 ---

 Key: MAPREDUCE-5640
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5640
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: test
Affects Versions: 0.23.9, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Trivial
 Fix For: 3.0.0, 2.4.0, 0.23.10

 Attachments: MAPREDUCE-5640.patch


 HADOOP-9622 proposes to add new unit tests for LineRecordReader in the 
 mapreduce-client-core module alongside the code.  The existing 
 LineRecordReader tests in the mapreduce-client-jobclient module should be 
 renamed to something like TestLineRecordReaderJobs to avoid a name conflict 
 and to better indicate these are integration tests using full jobs rather 
 than unit tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5662) implement the support for using the shared cache for the job jar and libjars

2013-12-02 Thread Sangjin Lee (JIRA)
Sangjin Lee created MAPREDUCE-5662:
--

 Summary: implement the support for using the shared cache for the 
job jar and libjars
 Key: MAPREDUCE-5662
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5662
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Sangjin Lee
Assignee: Sangjin Lee






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens

2013-12-02 Thread Siddharth Seth (JIRA)
Siddharth Seth created MAPREDUCE-5663:
-

 Summary: Add an interface to Input/Ouput Formats to obtain 
delegation tokens
 Key: MAPREDUCE-5663
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth


Currently, delegation tokens are obtained as part of the getSplits / 
checkOutputSpecs calls to the InputFormat / OutputFormat respectively.

This works as long as the splits are generated on a node with kerberos 
credentials. For split generation elsewhere (AM for example), an explicit 
interface is required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5654) blacklist is not propagated from AM to RM

2013-12-02 Thread Robert Grandl (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837146#comment-13837146
 ] 

Robert Grandl commented on MAPREDUCE-5654:
--

Do you have any thoughts with this guys ? 

 blacklist is not propagated from AM to RM
 -

 Key: MAPREDUCE-5654
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5654
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Reporter: Robert Grandl

 I was trying to blacklist some nodes. I added a set of hosts as strings into 
 blacklistAdditions list and propagated into 
 RMContainerRequestor#makeRemoteRequest to the RM. 
 However the blacklist is received empty at RM. I logged the path for 
 blacklist in AM and I found that in 
 ApplicationMasterProtocolPBClientImpl#allocate, this list is lost. 
 I print 
 request.getResourceBlacklistRequest().getBlacklistAdditions().toString() at 
 the beginning of ApplicationMasterProtocolPBClientImpl#allocate and the 
 blacklisted additions are there. 
 After AllocateRequestProto requestProto is created based on this request, and 
 I print again 
 requestProto.getBlacklistRequest().getBlacklistAdditionsList().toString(), 
 the blacklist additions is empty now.
 I looked even further and log what happened. At some point in yarn-api, I was 
 lost with my logging as that code was regenerated every time I recompiled 
 yarn-api. 
 Thanks,
 robert



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-12-02 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-5603:
-

Target Version/s: 3.0.0, 0.23.11, 2.3.0  (was: 3.0.0, 2.4.0, 0.23.11)

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-5603.patch, MAPREDUCE-5603.patch


 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks

2013-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837325#comment-13837325
 ] 

Hadoop QA commented on MAPREDUCE-4711:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12548038/MAPREDUCE-4711.branch-0.23.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4237//console

This message is automatically generated.

 Append time elapsed since job-start-time for finished tasks
 ---

 Key: MAPREDUCE-4711
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.3
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: MAPREDUCE-4711.branch-0.23.patch


 In 0.20.x/1.x, the analyze job link gave this information
 bq. The last Map task task_sometask finished at (relative to the Job launch 
 time): 5/10 20:23:10 (1hrs, 27mins, 54sec)
 The time it took for the last task to finish needs to be calculated mentally 
 in 0.23. I believe we should print it next to the finish time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4941) Use of org.apache.hadoop.mapred.lib.CombineFileRecordReader requires casting

2013-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837326#comment-13837326
 ] 

Hadoop QA commented on MAPREDUCE-4941:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578762/MAPREDUCE-4941.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4239//console

This message is automatically generated.

 Use of org.apache.hadoop.mapred.lib.CombineFileRecordReader requires casting
 

 Key: MAPREDUCE-4941
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4941
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-4941.patch, MAPREDUCE-4941.patch


 Unlike its counterpart in org.apache.hadoop.mapreduce.lib.input, the 
 CombineFileRecordReader in mapred requires a user to cast to a RecordReader 
 since the constructor specification says it must have the RecordReaderK,V 
 class as a parameter.  It should use {{Class? extends RecordReaderK,V}} 
 like its mapreduce counterpart to make it easier to use.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4901) JobHistoryEventHandler errors should be fatal

2013-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837324#comment-13837324
 ] 

Hadoop QA commented on MAPREDUCE-4901:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12562403/MR-4901-trunk.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4238//console

This message is automatically generated.

 JobHistoryEventHandler errors should be fatal
 -

 Key: MAPREDUCE-4901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4901
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0, 2.0.0-alpha
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MR-4901-trunk.txt


 To be able to truly fix issues like MAPREDUCE-4819 and MAPREDUCE-4832, we 
 need a 2 phase commit where a subsequent AM can be sure that at a specific 
 point in time it knows exactly if any tasks/jobs are committing.  The job 
 history log is already used for similar functionality so we would like to 
 reuse this, but we need to be sure that errors while writing out to the job 
 history log are now fatal.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5267) History server should be more robust when cleaning old jobs

2013-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837331#comment-13837331
 ] 

Hadoop QA commented on MAPREDUCE-5267:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12586262/MAPREDUCE-5267.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4240//console

This message is automatically generated.

 History server should be more robust when cleaning old jobs
 ---

 Key: MAPREDUCE-5267
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5267
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Jason Lowe
Assignee: Maysam Yabandeh
 Attachments: MAPREDUCE-5267.patch, MAPREDUCE-5267.patch


 Ran across a situation where an admin user had accidentally created a 
 directory in one of the date directories under /mapred/history/done/ that was 
 not readable by the historyserver user.  That effectively prevented the 
 history server from cleaning any jobs from that date forward, as it hit an 
 IOException trying to scan the directory and that aborted the entire clean 
 process.
 The history server should localize IOException handling to the directory/file 
 being processed and move on to the next entry in the list rather than 
 aborting the entire cleaning process.



--
This message was sent by Atlassian JIRA
(v6.1#6144)