[jira] [Created] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir
Eric Sirianni created MAPREDUCE-5661: Summary: ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir Key: MAPREDUCE-5661 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Eric Sirianni Priority: Trivial While debugging an issue where a MapReduce job is failing due to running out of disk space, I noticed that the {{ShuffleHandler}} uses {{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of the other MapReduce classes use {{mapreduce.cluster.local.dir}}: {noformat} $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ -name *.java | xargs grep new LocalDirAllocator( hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java: LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); *hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java: new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS); {noformat} This inconsistency feels like something that is likely to confuse admins. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836600#comment-13836600 ] Jason Lowe commented on MAPREDUCE-5656: --- That case is what the {{finished}} flag in CompressedSplitLineReader is intended to catch. Here's the scenario: # LineRecordReader calls readLine # The line processing causes us to fetch the next compressed block beyond the split (i.e.: fillBuffer is called). Let's say this causes us to set needAdditionalRecord=true. # LineRecordReader will process another iteration of the loop and call readLine again # readLine will notice that we are starting at a position past the end of the split and set finished=true. # At that point the needAdditionalRecordAfterSplit method will always return false and LineRecordReader should not read more than at most one record beyond the end of the split. The key is needAdditionalRecordAfterSplit() will always return false once readLine() is invoked at a position after the split ends. bzip2 codec can drop records when reading data in splits Key: MAPREDUCE-5656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.4-alpha, 0.23.8 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, HADOOP-9622.patch, MAPREDUCE-5656.patch, blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them in splits based on where record delimiters occur relative to compression block boundaries. Thanks to [~knoguchi] for discovering this problem while working on PIG-3251. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5656: -- Attachment: MAPREDUCE-5656-2.patch Slightly updated patch to fix the spacing issue in SplitLineReader. bzip2 codec can drop records when reading data in splits Key: MAPREDUCE-5656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.4-alpha, 0.23.8 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, HADOOP-9622.patch, MAPREDUCE-5656-2.patch, MAPREDUCE-5656.patch, blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them in splits based on where record delimiters occur relative to compression block boundaries. Thanks to [~knoguchi] for discovering this problem while working on PIG-3251. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836644#comment-13836644 ] Hadoop QA commented on MAPREDUCE-5656: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616562/MAPREDUCE-5656-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.mapred.TestLineRecordReader org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4235//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4235//console This message is automatically generated. bzip2 codec can drop records when reading data in splits Key: MAPREDUCE-5656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.4-alpha, 0.23.8 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, HADOOP-9622.patch, MAPREDUCE-5656-2.patch, MAPREDUCE-5656.patch, blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them in splits based on where record delimiters occur relative to compression block boundaries. Thanks to [~knoguchi] for discovering this problem while working on PIG-3251. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836652#comment-13836652 ] Jason Lowe commented on MAPREDUCE-5661: --- Note that YarnChild.configureLocalDirs sets this property based on an environment variable, which is itself derived from yarn.nodemanager.local-dirs, and therefore most of the other references are really coming from what was specified in yarn.nodemanager.local-dirs and not what was configured by an admin. The notable exception would be jobs run in local mode. Also note that the shuffle handler is a bit special in that it is the one piece of MapReduce code that runs as part of the YARN nodemanager process and not as part of a job or a client. It is more likely yarn.nodemanager.local-dirs is configured on a particular YARN node than mapreduce.cluster.local.dir, so I think it's appropriate that variable is used in the shuffle handler case. I don't think mapreduce.cluster.local.dir is even set on some of our clusters, as the MapReduce framework configures this variable for tasks when running under YARN. I wouldn't expect it to have to be configured by admins at all unless supporting jobs in local mode and for some reason the default isn't sufficient. ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir --- Key: MAPREDUCE-5661 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Eric Sirianni Priority: Trivial While debugging an issue where a MapReduce job is failing due to running out of disk space, I noticed that the {{ShuffleHandler}} uses {{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of the other MapReduce classes use {{mapreduce.cluster.local.dir}}: {noformat} $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ -name *.java | xargs grep new LocalDirAllocator( hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java: LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); *hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java: new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS); {noformat} This inconsistency feels like something that is likely to confuse admins. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836758#comment-13836758 ] Eric Sirianni commented on MAPREDUCE-5661: -- The description for {{mapreduce.cluster.local.dir}} implies that that directory will receive significant load: {code:xml} property namemapreduce.cluster.local.dir/name value${hadoop.tmp.dir}/mapred/local/value description The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. /description /property {code} Since you are suggesting that the default (typically in /tmp) is sufficient, perhaps that description should be altered? I'm observing that the shuffle is creating the majority of the disk I/O in my MapReduce jobs, which is using the {{yarn.nodemanager.local-dirs}}. ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir --- Key: MAPREDUCE-5661 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Eric Sirianni Priority: Trivial While debugging an issue where a MapReduce job is failing due to running out of disk space, I noticed that the {{ShuffleHandler}} uses {{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of the other MapReduce classes use {{mapreduce.cluster.local.dir}}: {noformat} $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ -name *.java | xargs grep new LocalDirAllocator( hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java: LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); *hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java: new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS); {noformat} This inconsistency feels like something that is likely to confuse admins. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (MAPREDUCE-5655) Remote job submit from windows to a linux hadoop cluster fails due to wrong classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved MAPREDUCE-5655. -- Resolution: Duplicate Hi, [~padisah]. Thanks for the bug report. I do think this is a duplicate of MAPREDUCE-4052. The 0.23.x code line is similar to the 2.2.x code line. It's often the case that a bug in 2.2.x is also a bug in 0.23.x. I've just updated MAPREDUCE-4052 to make the title clearer and indicate that it also affects version 2.2.0. I recommend that your participate on MAPREDUCE-4052. There is a patch attached to that issue, but it's a few months old, so it's likely to be out-of-date at this point. Seeing your latest patch would be valuable. You can upload your patch by clicking the More button at the top and then going through the Attach Files dialog. The Submit Patch button is used to submit your patch to Jenkins for a test run against current trunk. Thanks again! Remote job submit from windows to a linux hadoop cluster fails due to wrong classpath - Key: MAPREDUCE-5655 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5655 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, job submission Affects Versions: 2.2.0 Environment: Client machine is a Windows 7 box, with Eclipse Remote: there is a multi node hadoop cluster, installed on Ubuntu boxes (any linux) Reporter: Attila Pados I was trying to run a java class on my client, windows 7 developer environment, which submits a job to the remote Hadoop cluster, initiates a mapreduce there, and then downloads the results back to the local machine. General use case is to use hadoop services from a web application installed on a non-cluster computer, or as part of a developer environment. The problem was, that the ApplicationMaster's startup shell script (launch_container.sh) was generated with wrong CLASSPATH entry. Together with the java process call on the bottom of the file, these entries were generated in windows style, using % as shell variable marker and ; as the CLASSPATH delimiter. I tracked down the root cause, and found that the MrApps.java, and the YarnRunner.java classes create these entries, and is passed forward to the ApplicationMaster, assuming that the OS that runs these classes will match the one running the ApplicationMaster. But it's not the case, these are in 2 different jvm, and also the OS can be different, the strings are generated based on the client/submitter side's OS. I made some workaround changes to these 2 files, so i could launch my job, however there may be more problems ahead. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836766#comment-13836766 ] Hadoop QA commented on MAPREDUCE-4052: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523800/MAPREDUCE-4052-0.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4236//console This message is automatically generated. Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster. --- Key: MAPREDUCE-4052 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission Affects Versions: 0.23.1, 2.2.0 Environment: client on the Windows, the the cluster on the suse Reporter: xieguiming Assignee: xieguiming Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.patch when I use the eclipse on the windows to submit the job. and the applicationmaster throw the exception: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaster Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster. Program will exit. The reasion is : class Apps addToEnvironment function, use the private static final String SYSTEM_PATH_SEPARATOR = System.getProperty(path.separator); and will result the MRApplicationMaster classpath use the ; separator. I suggest that nodemanger do the replace. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836779#comment-13836779 ] Jason Lowe commented on MAPREDUCE-5661: --- Yes, in a YARN cluster the majority of the I/O on a node for MapReduce should be in the yarn.nodemanager.local-dirs directories. Those settings are replicated to mapreduce.cluster.local.dir during task startup under YARN, so admins don't normally have to configure mapreduce.cluster.local.dir. I would expect an explicit setting of mapreduce.cluster.local.dir to only take effect when running a job in local mode, which usually isn't a very big job and therefore the default of somewhere under /tmp is probably fine for most of those cases. So to sum up, tasks are using mapreduce.cluster.local.dir but the directories listed there are derived from yarn.nodemanager.local-dirs in a YARN cluster. Setting mapreduce.cluster.local.dir in mapred-site.xml would have no effect for most MapReduce jobs in a YARN cluster. ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir --- Key: MAPREDUCE-5661 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Eric Sirianni Priority: Trivial While debugging an issue where a MapReduce job is failing due to running out of disk space, I noticed that the {{ShuffleHandler}} uses {{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of the other MapReduce classes use {{mapreduce.cluster.local.dir}}: {noformat} $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ -name *.java | xargs grep new LocalDirAllocator( hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java: LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); *hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java: new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS); {noformat} This inconsistency feels like something that is likely to confuse admins. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5661) ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836792#comment-13836792 ] Eric Sirianni commented on MAPREDUCE-5661: -- {quote} Note that YarnChild.configureLocalDirs sets this property based on an environment variable {quote} Ah. I see now. I didn't realize that this property meant {{mapreduce.cluster.local.dir}}. Got it, thanks. ShuffleHandler using yarn.nodemanager.local-dirs instead of mapreduce.cluster.local.dir --- Key: MAPREDUCE-5661 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5661 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Eric Sirianni Priority: Trivial While debugging an issue where a MapReduce job is failing due to running out of disk space, I noticed that the {{ShuffleHandler}} uses {{yarn.nodemanager.local-dirs}} for its {{LocalDirAllocator}} whereas all of the other MapReduce classes use {{mapreduce.cluster.local.dir}}: {noformat} $ find hadoop-mapreduce-project/hadoop-mapreduce-client/*/src/main/java/ -name *.java | xargs grep new LocalDirAllocator( hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java: LocalDirAllocator lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnOutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MROutputFiles.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java: new LocalDirAllocator(MRConfig.LOCAL_DIR); hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java: this.lDirAlloc = new LocalDirAllocator(MRConfig.LOCAL_DIR); *hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java: new LocalDirAllocator(YarnConfiguration.NM_LOCAL_DIRS); {noformat} This inconsistency feels like something that is likely to confuse admins. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5640) Rename TestLineRecordReader in jobclient module
[ https://issues.apache.org/jira/browse/MAPREDUCE-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836800#comment-13836800 ] Jonathan Eagles commented on MAPREDUCE-5640: +1. Changes look good, Jason. Checking this in shortly. Rename TestLineRecordReader in jobclient module --- Key: MAPREDUCE-5640 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5640 Project: Hadoop Map/Reduce Issue Type: Improvement Components: test Affects Versions: 0.23.9, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Trivial Attachments: MAPREDUCE-5640.patch HADOOP-9622 proposes to add new unit tests for LineRecordReader in the mapreduce-client-core module alongside the code. The existing LineRecordReader tests in the mapreduce-client-jobclient module should be renamed to something like TestLineRecordReaderJobs to avoid a name conflict and to better indicate these are integration tests using full jobs rather than unit tests. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5640) Rename TestLineRecordReader in jobclient module
[ https://issues.apache.org/jira/browse/MAPREDUCE-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836810#comment-13836810 ] Hudson commented on MAPREDUCE-5640: --- SUCCESS: Integrated in Hadoop-trunk-Commit #4814 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4814/]) MAPREDUCE-5640. Rename TestLineRecordReader in jobclient module (Jason Lowe via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1547149) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestLineRecordReaderJobs.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReaderJobs.java Rename TestLineRecordReader in jobclient module --- Key: MAPREDUCE-5640 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5640 Project: Hadoop Map/Reduce Issue Type: Improvement Components: test Affects Versions: 0.23.9, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Trivial Attachments: MAPREDUCE-5640.patch HADOOP-9622 proposes to add new unit tests for LineRecordReader in the mapreduce-client-core module alongside the code. The existing LineRecordReader tests in the mapreduce-client-jobclient module should be renamed to something like TestLineRecordReaderJobs to avoid a name conflict and to better indicate these are integration tests using full jobs rather than unit tests. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5640) Rename TestLineRecordReader in jobclient module
[ https://issues.apache.org/jira/browse/MAPREDUCE-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-5640: --- Resolution: Fixed Fix Version/s: 0.23.10 2.4.0 3.0.0 Status: Resolved (was: Patch Available) Rename TestLineRecordReader in jobclient module --- Key: MAPREDUCE-5640 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5640 Project: Hadoop Map/Reduce Issue Type: Improvement Components: test Affects Versions: 0.23.9, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Trivial Fix For: 3.0.0, 2.4.0, 0.23.10 Attachments: MAPREDUCE-5640.patch HADOOP-9622 proposes to add new unit tests for LineRecordReader in the mapreduce-client-core module alongside the code. The existing LineRecordReader tests in the mapreduce-client-jobclient module should be renamed to something like TestLineRecordReaderJobs to avoid a name conflict and to better indicate these are integration tests using full jobs rather than unit tests. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5662) implement the support for using the shared cache for the job jar and libjars
Sangjin Lee created MAPREDUCE-5662: -- Summary: implement the support for using the shared cache for the job jar and libjars Key: MAPREDUCE-5662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5662 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Sangjin Lee Assignee: Sangjin Lee -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
Siddharth Seth created MAPREDUCE-5663: - Summary: Add an interface to Input/Ouput Formats to obtain delegation tokens Key: MAPREDUCE-5663 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs calls to the InputFormat / OutputFormat respectively. This works as long as the splits are generated on a node with kerberos credentials. For split generation elsewhere (AM for example), an explicit interface is required. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5654) blacklist is not propagated from AM to RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837146#comment-13837146 ] Robert Grandl commented on MAPREDUCE-5654: -- Do you have any thoughts with this guys ? blacklist is not propagated from AM to RM - Key: MAPREDUCE-5654 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5654 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Reporter: Robert Grandl I was trying to blacklist some nodes. I added a set of hosts as strings into blacklistAdditions list and propagated into RMContainerRequestor#makeRemoteRequest to the RM. However the blacklist is received empty at RM. I logged the path for blacklist in AM and I found that in ApplicationMasterProtocolPBClientImpl#allocate, this list is lost. I print request.getResourceBlacklistRequest().getBlacklistAdditions().toString() at the beginning of ApplicationMasterProtocolPBClientImpl#allocate and the blacklisted additions are there. After AllocateRequestProto requestProto is created based on this request, and I print again requestProto.getBlacklistRequest().getBlacklistAdditionsList().toString(), the blacklist additions is empty now. I looked even further and log what happened. At some point in yarn-api, I was lost with my logging as that code was regenerated every time I recompiled yarn-api. Thanks, robert -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated MAPREDUCE-5603: - Target Version/s: 3.0.0, 0.23.11, 2.3.0 (was: 3.0.0, 2.4.0, 0.23.11) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory --- Key: MAPREDUCE-5603 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mrv2 Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor Attachments: MAPREDUCE-5603.patch, MAPREDUCE-5603.patch It would be nice if users had the option to disable the listLocatedStatus optimization in FileInputFormat to save client memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837325#comment-13837325 ] Hadoop QA commented on MAPREDUCE-4711: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548038/MAPREDUCE-4711.branch-0.23.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4237//console This message is automatically generated. Append time elapsed since job-start-time for finished tasks --- Key: MAPREDUCE-4711 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 0.23.3 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: MAPREDUCE-4711.branch-0.23.patch In 0.20.x/1.x, the analyze job link gave this information bq. The last Map task task_sometask finished at (relative to the Job launch time): 5/10 20:23:10 (1hrs, 27mins, 54sec) The time it took for the last task to finish needs to be calculated mentally in 0.23. I believe we should print it next to the finish time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4941) Use of org.apache.hadoop.mapred.lib.CombineFileRecordReader requires casting
[ https://issues.apache.org/jira/browse/MAPREDUCE-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837326#comment-13837326 ] Hadoop QA commented on MAPREDUCE-4941: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578762/MAPREDUCE-4941.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4239//console This message is automatically generated. Use of org.apache.hadoop.mapred.lib.CombineFileRecordReader requires casting Key: MAPREDUCE-4941 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4941 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor Attachments: MAPREDUCE-4941.patch, MAPREDUCE-4941.patch Unlike its counterpart in org.apache.hadoop.mapreduce.lib.input, the CombineFileRecordReader in mapred requires a user to cast to a RecordReader since the constructor specification says it must have the RecordReaderK,V class as a parameter. It should use {{Class? extends RecordReaderK,V}} like its mapreduce counterpart to make it easier to use. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4901) JobHistoryEventHandler errors should be fatal
[ https://issues.apache.org/jira/browse/MAPREDUCE-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837324#comment-13837324 ] Hadoop QA commented on MAPREDUCE-4901: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562403/MR-4901-trunk.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4238//console This message is automatically generated. JobHistoryEventHandler errors should be fatal - Key: MAPREDUCE-4901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4901 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0, 2.0.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MR-4901-trunk.txt To be able to truly fix issues like MAPREDUCE-4819 and MAPREDUCE-4832, we need a 2 phase commit where a subsequent AM can be sure that at a specific point in time it knows exactly if any tasks/jobs are committing. The job history log is already used for similar functionality so we would like to reuse this, but we need to be sure that errors while writing out to the job history log are now fatal. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5267) History server should be more robust when cleaning old jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837331#comment-13837331 ] Hadoop QA commented on MAPREDUCE-5267: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586262/MAPREDUCE-5267.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4240//console This message is automatically generated. History server should be more robust when cleaning old jobs --- Key: MAPREDUCE-5267 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5267 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Jason Lowe Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5267.patch, MAPREDUCE-5267.patch Ran across a situation where an admin user had accidentally created a directory in one of the date directories under /mapred/history/done/ that was not readable by the historyserver user. That effectively prevented the history server from cleaning any jobs from that date forward, as it hit an IOException trying to scan the directory and that aborted the entire clean process. The history server should localize IOException handling to the directory/file being processed and move on to the next entry in the list rather than aborting the entire cleaning process. -- This message was sent by Atlassian JIRA (v6.1#6144)