[jira] [Updated] (MAPREDUCE-434) local map-reduce job limited to single reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-434: - Issue Type: Improvement (was: Bug) local map-reduce job limited to single reducer -- Key: MAPREDUCE-434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: local job tracker Reporter: Yoram Arnon Assignee: Aaron Kimball Priority: Minor Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and the number of reduce tasks is set at 1. This prevents me from locally debugging my partition function, which tries to partition based on the number of reduce tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-434) LocalJobRunner limited to single reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-434: - Summary: LocalJobRunner limited to single reducer (was: local map-reduce job limited to single reducer) LocalJobRunner limited to single reducer Key: MAPREDUCE-434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: local job tracker Reporter: Yoram Arnon Assignee: Aaron Kimball Priority: Minor Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and the number of reduce tasks is set at 1. This prevents me from locally debugging my partition function, which tries to partition based on the number of reduce tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-434) LocalJobRunner limited to single reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-434: - Resolution: Fixed Fix Version/s: 2.3.0 Status: Resolved (was: Patch Available) Thanks Aaron and Tom for the review. Committed this to trunk and branch-2. LocalJobRunner limited to single reducer Key: MAPREDUCE-434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: local job tracker Reporter: Yoram Arnon Assignee: Aaron Kimball Priority: Minor Fix For: 2.3.0 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and the number of reduce tasks is set at 1. This prevents me from locally debugging my partition function, which tries to partition based on the number of reduce tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-434) LocalJobRunner limited to single reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730456#comment-13730456 ] Hudson commented on MAPREDUCE-434: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4219 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4219/]) MAPREDUCE-434. LocalJobRunner limited to single reducer (Sandy Ryza and Aaron Kimball via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510866) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ReduceTask.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ShuffleConsumerPlugin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/TestShufflePlugin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestJobCounters.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestKeyFieldBasedComparator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLocalRunner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestMRKeyFieldBasedComparator.java LocalJobRunner limited to single reducer Key: MAPREDUCE-434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: local job tracker Reporter: Yoram Arnon Assignee: Aaron Kimball Priority: Minor Fix For: 2.3.0 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and the number of reduce tasks is set at 1. This prevents me from locally debugging my partition function, which tries to partition based on the number of reduce tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions
[ https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730631#comment-13730631 ] Hudson commented on MAPREDUCE-5446: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/293/]) MAPREDUCE-5446. TestJobHistoryEvents and TestJobHistoryParsing have race conditions. Contributed by Jason Lowe. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510581) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java TestJobHistoryEvents and TestJobHistoryParsing have race conditions --- Key: MAPREDUCE-5446 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 3.0.0, 2.1.1-beta Attachments: MAPREDUCE-5446.patch TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for MRApp to finish. Currently they are polling the service state looking for Service.STATE.STOPPED, but the service can appear to be in that state *before* it is fully stopped. This causes tests to finish with MRApp threads still in-flight, and those threads can conflict with subsequent tests when they collide in the filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-434) LocalJobRunner limited to single reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730625#comment-13730625 ] Hudson commented on MAPREDUCE-434: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/293/]) MAPREDUCE-434. LocalJobRunner limited to single reducer (Sandy Ryza and Aaron Kimball via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510866) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ReduceTask.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ShuffleConsumerPlugin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/TestShufflePlugin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestJobCounters.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestKeyFieldBasedComparator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLocalRunner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestMRKeyFieldBasedComparator.java LocalJobRunner limited to single reducer Key: MAPREDUCE-434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: local job tracker Reporter: Yoram Arnon Assignee: Aaron Kimball Priority: Minor Fix For: 2.3.0 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and the number of reduce tasks is set at 1. This prevents me from locally debugging my partition function, which tries to partition based on the number of reduce tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730637#comment-13730637 ] Hudson commented on MAPREDUCE-5367: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/293/]) MAPREDUCE-5367. Local jobs all use same local working directory (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510610) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java Local jobs all use same local working directory --- Key: MAPREDUCE-5367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0, 2.1.1-beta Attachments: MAPREDUCE-5367-b1-1.patch, MAPREDUCE-5367-b1.patch, MAPREDUCE-5367.patch This means that local jobs, even in different JVMs, can't run concurrently because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5399) Unnecessary Configuration instantiation in IFileInputStream slows down merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730640#comment-13730640 ] Hudson commented on MAPREDUCE-5399: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/293/]) MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510811) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java Unnecessary Configuration instantiation in IFileInputStream slows down merge Key: MAPREDUCE-5399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5399 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 1.1.0, 2.0.2-alpha Reporter: Stanislav Barton Assignee: Stanislav Barton Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5399.patch We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on input each about 100MB) and 6 000 reducers (one reducer per table region). I was trying to figure out what at which phase the slow down appears (firstly I suspected that the slow gathering of the 1 map output files is the culprit) and found out that the problem is not reading the map output (the shuffle) but the sort/merge phase that follows - the last and actual reduce phase is fast. I have tried to up the io.sort.factor because I thought the lots of small files are being merged on disk, but again upping that to 1000 didnt do any difference. I have then printed the stack trace and found out that the problem is initialization of the org.apache.hadoop.mapred.IFileInputStream namely the creation of the Configuration object which is not propagated along from earlier context, see the stack trace: Thread 13332: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) @bci=136, line=999 (Compiled frame) - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) @bci=3, line=966 (Compiled frame) - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, line=146 (Compiled frame) - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction, java.security.AccessControlContext) @bci=0 (Compiled frame) - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 (Compiled frame) - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 (Compiled frame) - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, line=1192 (Compiled frame) - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction) @bci=0 (Compiled frame) - javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, java.lang.String) @bci=10, line=89 (Compiled frame) - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) @bci=38, line=250 (Interpreted frame) - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) @bci=273, line=223 (Interpreted frame) - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame) - org.apache.hadoop.conf.Configuration.getProps()
[jira] [Updated] (MAPREDUCE-5450) Unnecessary Configuration instantiation in IFileInputStream slows down merge - Port to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanislav Barton updated MAPREDUCE-5450: Attachment: MAPREDUCE-5450-1.1.0.txt Unnecessary Configuration instantiation in IFileInputStream slows down merge - Port to branch-1 --- Key: MAPREDUCE-5450 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5450 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.0 Reporter: Stanislav Barton Assignee: Stanislav Barton Priority: Blocker Fix For: 1.2.1 Attachments: MAPREDUCE-5450-1.1.0.txt We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on input each about 100MB) and 6 000 reducers (one reducer per table region). I was trying to figure out what at which phase the slow down appears (firstly I suspected that the slow gathering of the 1 map output files is the culprit) and found out that the problem is not reading the map output (the shuffle) but the sort/merge phase that follows - the last and actual reduce phase is fast. I have tried to up the io.sort.factor because I thought the lots of small files are being merged on disk, but again upping that to 1000 didnt do any difference. I have then printed the stack trace and found out that the problem is initialization of the org.apache.hadoop.mapred.IFileInputStream namely the creation of the Configuration object which is not propagated along from earlier context, see the stack trace: Thread 13332: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) @bci=136, line=999 (Compiled frame) - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) @bci=3, line=966 (Compiled frame) - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, line=146 (Compiled frame) - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction, java.security.AccessControlContext) @bci=0 (Compiled frame) - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 (Compiled frame) - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 (Compiled frame) - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, line=1192 (Compiled frame) - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction) @bci=0 (Compiled frame) - javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, java.lang.String) @bci=10, line=89 (Compiled frame) - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) @bci=38, line=250 (Interpreted frame) - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) @bci=273, line=223 (Interpreted frame) - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame) - org.apache.hadoop.conf.Configuration.getProps() @bci=43, line=1785 (Compiled frame) - org.apache.hadoop.conf.Configuration.get(java.lang.String) @bci=35, line=712 (Compiled frame) - org.apache.hadoop.conf.Configuration.getTrimmed(java.lang.String) @bci=2, line=731 (Compiled frame) - org.apache.hadoop.conf.Configuration.getBoolean(java.lang.String, boolean) @bci=2, line=1047 (Interpreted frame) - org.apache.hadoop.mapred.IFileInputStream.init(java.io.InputStream, long, org.apache.hadoop.conf.Configuration) @bci=111, line=93 (Interpreted frame) - org.apache.hadoop.mapred.IFile$Reader.init(org.apache.hadoop.conf.Configuration, org.apache.hadoop.fs.FSDataInputStream, long, org.apache.hadoop.io.compress.CompressionCodec, org.apache.hadoop.mapred.Counters$Counter) @bci=60, line=303 (Interpreted frame) -
[jira] [Updated] (MAPREDUCE-5432) JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file
[ https://issues.apache.org/jira/browse/MAPREDUCE-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5432: -- Attachment: MAPREDUCE-5432.1.patch Updated JobHistoryServer to fetch these attributes from events. JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file --- Key: MAPREDUCE-5432 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5432 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Attachments: MAPREDUCE-5432.1.patch JobHistoryParser's handleMapAttemptFinishedEvent() function does not look at MapAttemptFinishedEvent's int[] clockSplits; int[] cpuUsages; int[] vMemKbytes; int[] physMemKbytes; JobHistoryParser's inner class TaskAttemptInfo also needs to be enhanced to have these as members so that handleMapAttemptFinishedEvent() can get them and store them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5432) JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file
[ https://issues.apache.org/jira/browse/MAPREDUCE-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5432: -- Assignee: Tsuyoshi OZAWA Status: Patch Available (was: Open) JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file --- Key: MAPREDUCE-5432 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5432 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-5432.1.patch JobHistoryParser's handleMapAttemptFinishedEvent() function does not look at MapAttemptFinishedEvent's int[] clockSplits; int[] cpuUsages; int[] vMemKbytes; int[] physMemKbytes; JobHistoryParser's inner class TaskAttemptInfo also needs to be enhanced to have these as members so that handleMapAttemptFinishedEvent() can get them and store them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730740#comment-13730740 ] Hudson commented on MAPREDUCE-5367: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/]) MAPREDUCE-5367. Local jobs all use same local working directory (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510610) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java Local jobs all use same local working directory --- Key: MAPREDUCE-5367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0, 2.1.1-beta Attachments: MAPREDUCE-5367-b1-1.patch, MAPREDUCE-5367-b1.patch, MAPREDUCE-5367.patch This means that local jobs, even in different JVMs, can't run concurrently because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-434) LocalJobRunner limited to single reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730728#comment-13730728 ] Hudson commented on MAPREDUCE-434: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/]) MAPREDUCE-434. LocalJobRunner limited to single reducer (Sandy Ryza and Aaron Kimball via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510866) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ReduceTask.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ShuffleConsumerPlugin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/TestShufflePlugin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestJobCounters.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestKeyFieldBasedComparator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLocalRunner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestMRKeyFieldBasedComparator.java LocalJobRunner limited to single reducer Key: MAPREDUCE-434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: local job tracker Reporter: Yoram Arnon Assignee: Aaron Kimball Priority: Minor Fix For: 2.3.0 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and the number of reduce tasks is set at 1. This prevents me from locally debugging my partition function, which tries to partition based on the number of reduce tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions
[ https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730734#comment-13730734 ] Hudson commented on MAPREDUCE-5446: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/]) MAPREDUCE-5446. TestJobHistoryEvents and TestJobHistoryParsing have race conditions. Contributed by Jason Lowe. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510581) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java TestJobHistoryEvents and TestJobHistoryParsing have race conditions --- Key: MAPREDUCE-5446 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 3.0.0, 2.1.1-beta Attachments: MAPREDUCE-5446.patch TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for MRApp to finish. Currently they are polling the service state looking for Service.STATE.STOPPED, but the service can appear to be in that state *before* it is fully stopped. This causes tests to finish with MRApp threads still in-flight, and those threads can conflict with subsequent tests when they collide in the filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5399) Unnecessary Configuration instantiation in IFileInputStream slows down merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730743#comment-13730743 ] Hudson commented on MAPREDUCE-5399: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/]) MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510811) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java Unnecessary Configuration instantiation in IFileInputStream slows down merge Key: MAPREDUCE-5399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5399 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 1.1.0, 2.0.2-alpha Reporter: Stanislav Barton Assignee: Stanislav Barton Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5399.patch We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on input each about 100MB) and 6 000 reducers (one reducer per table region). I was trying to figure out what at which phase the slow down appears (firstly I suspected that the slow gathering of the 1 map output files is the culprit) and found out that the problem is not reading the map output (the shuffle) but the sort/merge phase that follows - the last and actual reduce phase is fast. I have tried to up the io.sort.factor because I thought the lots of small files are being merged on disk, but again upping that to 1000 didnt do any difference. I have then printed the stack trace and found out that the problem is initialization of the org.apache.hadoop.mapred.IFileInputStream namely the creation of the Configuration object which is not propagated along from earlier context, see the stack trace: Thread 13332: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) @bci=136, line=999 (Compiled frame) - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) @bci=3, line=966 (Compiled frame) - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, line=146 (Compiled frame) - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction, java.security.AccessControlContext) @bci=0 (Compiled frame) - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 (Compiled frame) - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 (Compiled frame) - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, line=1192 (Compiled frame) - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction) @bci=0 (Compiled frame) - javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, java.lang.String) @bci=10, line=89 (Compiled frame) - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) @bci=38, line=250 (Interpreted frame) - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) @bci=273, line=223 (Interpreted frame) - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame) - org.apache.hadoop.conf.Configuration.getProps()
[jira] [Commented] (MAPREDUCE-5432) JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file
[ https://issues.apache.org/jira/browse/MAPREDUCE-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730750#comment-13730750 ] Hadoop QA commented on MAPREDUCE-5432: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596342/MAPREDUCE-5432.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3933//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3933//console This message is automatically generated. JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file --- Key: MAPREDUCE-5432 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5432 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-5432.1.patch JobHistoryParser's handleMapAttemptFinishedEvent() function does not look at MapAttemptFinishedEvent's int[] clockSplits; int[] cpuUsages; int[] vMemKbytes; int[] physMemKbytes; JobHistoryParser's inner class TaskAttemptInfo also needs to be enhanced to have these as members so that handleMapAttemptFinishedEvent() can get them and store them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-434) LocalJobRunner limited to single reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730797#comment-13730797 ] Hudson commented on MAPREDUCE-434: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/]) MAPREDUCE-434. LocalJobRunner limited to single reducer (Sandy Ryza and Aaron Kimball via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510866) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ReduceTask.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ShuffleConsumerPlugin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/TestShufflePlugin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestJobCounters.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestKeyFieldBasedComparator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLocalRunner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestMRKeyFieldBasedComparator.java LocalJobRunner limited to single reducer Key: MAPREDUCE-434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: local job tracker Reporter: Yoram Arnon Assignee: Aaron Kimball Priority: Minor Fix For: 2.3.0 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and the number of reduce tasks is set at 1. This prevents me from locally debugging my partition function, which tries to partition based on the number of reduce tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions
[ https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730803#comment-13730803 ] Hudson commented on MAPREDUCE-5446: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/]) MAPREDUCE-5446. TestJobHistoryEvents and TestJobHistoryParsing have race conditions. Contributed by Jason Lowe. (kihwal: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510581) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java TestJobHistoryEvents and TestJobHistoryParsing have race conditions --- Key: MAPREDUCE-5446 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 3.0.0, 2.1.1-beta Attachments: MAPREDUCE-5446.patch TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for MRApp to finish. Currently they are polling the service state looking for Service.STATE.STOPPED, but the service can appear to be in that state *before* it is fully stopped. This causes tests to finish with MRApp threads still in-flight, and those threads can conflict with subsequent tests when they collide in the filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730809#comment-13730809 ] Hudson commented on MAPREDUCE-5367: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/]) MAPREDUCE-5367. Local jobs all use same local working directory (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510610) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java Local jobs all use same local working directory --- Key: MAPREDUCE-5367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.3.0, 2.1.1-beta Attachments: MAPREDUCE-5367-b1-1.patch, MAPREDUCE-5367-b1.patch, MAPREDUCE-5367.patch This means that local jobs, even in different JVMs, can't run concurrently because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5399) Unnecessary Configuration instantiation in IFileInputStream slows down merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730812#comment-13730812 ] Hudson commented on MAPREDUCE-5399: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/]) MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510811) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java Unnecessary Configuration instantiation in IFileInputStream slows down merge Key: MAPREDUCE-5399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5399 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 1.1.0, 2.0.2-alpha Reporter: Stanislav Barton Assignee: Stanislav Barton Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5399.patch We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on input each about 100MB) and 6 000 reducers (one reducer per table region). I was trying to figure out what at which phase the slow down appears (firstly I suspected that the slow gathering of the 1 map output files is the culprit) and found out that the problem is not reading the map output (the shuffle) but the sort/merge phase that follows - the last and actual reduce phase is fast. I have tried to up the io.sort.factor because I thought the lots of small files are being merged on disk, but again upping that to 1000 didnt do any difference. I have then printed the stack trace and found out that the problem is initialization of the org.apache.hadoop.mapred.IFileInputStream namely the creation of the Configuration object which is not propagated along from earlier context, see the stack trace: Thread 13332: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) @bci=136, line=999 (Compiled frame) - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) @bci=3, line=966 (Compiled frame) - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, line=146 (Compiled frame) - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction, java.security.AccessControlContext) @bci=0 (Compiled frame) - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 (Compiled frame) - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 (Compiled frame) - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, line=1192 (Compiled frame) - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction) @bci=0 (Compiled frame) - javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, java.lang.String) @bci=10, line=89 (Compiled frame) - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) @bci=38, line=250 (Interpreted frame) - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) @bci=273, line=223 (Interpreted frame) - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame) -
[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731036#comment-13731036 ] Mariappan Asokan commented on MAPREDUCE-1176: - I was looking for an implementation of this record format as well. I agree with the following comment by Todd: {quote} As a general note, I'm not sure I agree with the design here. Rather than forcing the split to lie on record boundaries, I think it would be simpler to simply let FileInputFormat compute its own splits, and then when you first open the record reader, skip forward to the next record boundary and begin reading from there. Then for the last record of the file, over read your split into the beginning of the next one. This is the strategy that other input formats take, and should be compatible with the splittable compression codecs (see TextInputFormat for example). {quote} I think we should support fixed length records spanning across HDFS blocks. BitsOfInfo, do you mind if I pick up your patch, enhance it to take care of the above case, and post a patch for the trunk? I would appreciate if a committer can come forward to review the patch and commit it to the trunk. Thanks. -- Asokan Contribution: FixedLengthInputFormat and FixedLengthRecordReader Key: MAPREDUCE-1176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 0.20.1, 0.20.2 Environment: Any Reporter: BitsOfInfo Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch Hello, I would like to contribute the following two classes for incorporation into the mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. Provided are two classes first is the FixedLengthInputFormat and its corresponding FixedLengthRecordReader. When creating a job that specifies this input format, the job must have the mapreduce.input.fixedlengthinputformat.record.length property set as follows myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]); OR myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, [myFixedRecordLength]); This input format overrides computeSplitSize() in order to ensure that InputSplits do not contain any partial records since with fixed records there is no way to determine where a record begins if that were to occur. Each InputSplit passed to the FixedLengthRecordReader will start at the beginning of a record, and the last byte in the InputSplit will be the last byte of a record. The override of computeSplitSize() delegates to FileInputFormat's compute method, and then adjusts the returned split size by doing the following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength) This suite of fixed length input format classes, does not support compressed files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5311: -- Summary: Replace SLOTS_MILLIS counters with MEM_MILLIS (was: Remove slot millis computation logic and deprecate counter constants) Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731067#comment-13731067 ] Alejandro Abdelnur commented on MAPREDUCE-5311: --- [~acmurthy], unless I got things wrong, we agreed to keep slot-millis around until we have memory-millis. And the latest patch here is doing that. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5432) JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file
[ https://issues.apache.org/jira/browse/MAPREDUCE-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731078#comment-13731078 ] Vrushali C commented on MAPREDUCE-5432: --- A few minor comments: + printAttributes(CLOCK_SPLIT, clockSplits); + printAttributes(CPU_USAGE, cpuUsages); Maybe print it as CLOCK_SPLITS AND CPU_USAGES so that it is consistent with the naming of the attribute? for the following function: +void printAttributes(String tag, int[] attributes) { + for (int i=0;iattributes.length;i++) { +System.out.println(tag + [ + i + ]: + attributes[i]); + } +} I believe it would print it as: CLOCK_SPLIT[0]: 8691 CLOCK_SPLIT[1]: 129 CLOCK_SPLIT[2]: 128 CLOCK_SPLIT[3]: 129 CLOCK_SPLIT[4]: 128 ... etc Would it be slightly better to print it as: clockSplits:[8691,129,128,129,128,129,128,129,128,129,128,129] cpuUsages:[28,28,29,28,28,29,28,28,29,28,28,29] JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file --- Key: MAPREDUCE-5432 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5432 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-5432.1.patch JobHistoryParser's handleMapAttemptFinishedEvent() function does not look at MapAttemptFinishedEvent's int[] clockSplits; int[] cpuUsages; int[] vMemKbytes; int[] physMemKbytes; JobHistoryParser's inner class TaskAttemptInfo also needs to be enhanced to have these as members so that handleMapAttemptFinishedEvent() can get them and store them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731076#comment-13731076 ] Debashis Saha commented on MAPREDUCE-1176: -- The reason other input format takes that approach is they don't have any other way to figure out exact boundary. With fixed format you can exactly know the boundary and in my opinion you should take advantage of it. -- - Deba --~O~-- Contribution: FixedLengthInputFormat and FixedLengthRecordReader Key: MAPREDUCE-1176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 0.20.1, 0.20.2 Environment: Any Reporter: BitsOfInfo Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch Hello, I would like to contribute the following two classes for incorporation into the mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. Provided are two classes first is the FixedLengthInputFormat and its corresponding FixedLengthRecordReader. When creating a job that specifies this input format, the job must have the mapreduce.input.fixedlengthinputformat.record.length property set as follows myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]); OR myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, [myFixedRecordLength]); This input format overrides computeSplitSize() in order to ensure that InputSplits do not contain any partial records since with fixed records there is no way to determine where a record begins if that were to occur. Each InputSplit passed to the FixedLengthRecordReader will start at the beginning of a record, and the last byte in the InputSplit will be the last byte of a record. The override of computeSplitSize() delegates to FileInputFormat's compute method, and then adjusts the returned split size by doing the following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength) This suite of fixed length input format classes, does not support compressed files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731089#comment-13731089 ] Mariappan Asokan commented on MAPREDUCE-1176: - Hi Debashis, You are correct. It is easy to identify records spanning across HDFS blocks. -- Asokan Contribution: FixedLengthInputFormat and FixedLengthRecordReader Key: MAPREDUCE-1176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 0.20.1, 0.20.2 Environment: Any Reporter: BitsOfInfo Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch Hello, I would like to contribute the following two classes for incorporation into the mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. Provided are two classes first is the FixedLengthInputFormat and its corresponding FixedLengthRecordReader. When creating a job that specifies this input format, the job must have the mapreduce.input.fixedlengthinputformat.record.length property set as follows myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]); OR myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, [myFixedRecordLength]); This input format overrides computeSplitSize() in order to ensure that InputSplits do not contain any partial records since with fixed records there is no way to determine where a record begins if that were to occur. Each InputSplit passed to the FixedLengthRecordReader will start at the beginning of a record, and the last byte in the InputSplit will be the last byte of a record. The override of computeSplitSize() delegates to FileInputFormat's compute method, and then adjusts the returned split size by doing the following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength) This suite of fixed length input format classes, does not support compressed files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731149#comment-13731149 ] BitsOfInfo commented on MAPREDUCE-1176: --- Asokan: Sure go ahead make whatever changes are necessary; as I have no time to work on this anymore; yet would like to see this put into the project as I had a use for it when I created it and I'm sure others do as well. BTW: Never had my original question answered from a few years ago in regards to the design, maybe I'm was missing something. bq. Hmm, ok, do you have suggestion on how I detect where one record begins and one record ends when records are not identifiable by any sort of consistent start character or end character boundary but just flow together? I could see the RecordReader detecting that it only read RECORD LENGTH bytes and hitting the end of the split and discarding it. But I am not sure how it would detect the start of a record, with a split that has partial data at the start of it. Especially if there is no consistent boundary/char marker that identifies the start of a record. Contribution: FixedLengthInputFormat and FixedLengthRecordReader Key: MAPREDUCE-1176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 0.20.1, 0.20.2 Environment: Any Reporter: BitsOfInfo Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch Hello, I would like to contribute the following two classes for incorporation into the mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. Provided are two classes first is the FixedLengthInputFormat and its corresponding FixedLengthRecordReader. When creating a job that specifies this input format, the job must have the mapreduce.input.fixedlengthinputformat.record.length property set as follows myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]); OR myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, [myFixedRecordLength]); This input format overrides computeSplitSize() in order to ensure that InputSplits do not contain any partial records since with fixed records there is no way to determine where a record begins if that were to occur. Each InputSplit passed to the FixedLengthRecordReader will start at the beginning of a record, and the last byte in the InputSplit will be the last byte of a record. The override of computeSplitSize() delegates to FileInputFormat's compute method, and then adjusts the returned split size by doing the following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength) This suite of fixed length input format classes, does not support compressed files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5439) mapred-default.xml has missing properties
[ https://issues.apache.org/jira/browse/MAPREDUCE-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jaimin D Jetly updated MAPREDUCE-5439: -- Summary: mapred-default.xml has missing properties (was: mpared-default.xml has missing properties) mapred-default.xml has missing properties - Key: MAPREDUCE-5439 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5439 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.1.0-beta Reporter: Siddharth Wagle Fix For: 2.1.0-beta Properties that need to be added: mapreduce.map.memory.mb mapreduce.map.java.opts mapreduce.reduce.memory.mb mapreduce.reduce.java.opts Properties that need to be fixed: mapred.child.java.opts should not be in mapred-default. yarn.app.mapreduce.am.command-opts description needs fixing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5449) Jobtracker WebUI Add acl to control access to views of submitted jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731219#comment-13731219 ] Chris Nauroth commented on MAPREDUCE-5449: -- Hi, Christopher. Is this a duplicate of MAPREDUCE-5109? Jobtracker WebUI Add acl to control access to views of submitted jobs - Key: MAPREDUCE-5449 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5449 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Reporter: Christopher LaPlante Jobtracker WebUI currently displays the job name for every job submitted. Would like the ability to apply ACL's so that users/groups can only see the job's they have submitted and not all the jobs submitted to the cluster. Hive queries put the query as the job name in job tracker. This query information can contain sensitive information, currently the only way to limit access is to limit access to the job tracker ui which reduces the job owners ability to troubleshoot issues -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5450) Unnecessary Configuration instantiation in IFileInputStream slows down merge - Port to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731235#comment-13731235 ] Sandy Ryza commented on MAPREDUCE-5450: --- The attached patch is failing to compile with the following error: {code} [javac] /home/sandy/svn-apache/branch-1/src/mapred/org/apache/hadoop/mapred/ReduceTask.java:2445: non-static variable conf cannot be referenced from a static context [javac] mo.data, 0, mo.data.length, conf); {code} Unnecessary Configuration instantiation in IFileInputStream slows down merge - Port to branch-1 --- Key: MAPREDUCE-5450 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5450 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.0 Reporter: Stanislav Barton Assignee: Stanislav Barton Priority: Blocker Fix For: 1.2.1 Attachments: MAPREDUCE-5450-1.1.0.txt We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on input each about 100MB) and 6 000 reducers (one reducer per table region). I was trying to figure out what at which phase the slow down appears (firstly I suspected that the slow gathering of the 1 map output files is the culprit) and found out that the problem is not reading the map output (the shuffle) but the sort/merge phase that follows - the last and actual reduce phase is fast. I have tried to up the io.sort.factor because I thought the lots of small files are being merged on disk, but again upping that to 1000 didnt do any difference. I have then printed the stack trace and found out that the problem is initialization of the org.apache.hadoop.mapred.IFileInputStream namely the creation of the Configuration object which is not propagated along from earlier context, see the stack trace: Thread 13332: (state = IN_NATIVE) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=733 (Compiled frame) - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) @bci=136, line=999 (Compiled frame) - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) @bci=3, line=966 (Compiled frame) - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, line=146 (Compiled frame) - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction, java.security.AccessControlContext) @bci=0 (Compiled frame) - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 (Compiled frame) - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 (Compiled frame) - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, line=1192 (Compiled frame) - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction) @bci=0 (Compiled frame) - javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, java.lang.String) @bci=10, line=89 (Compiled frame) - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) @bci=38, line=250 (Interpreted frame) - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) @bci=273, line=223 (Interpreted frame) - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 (Compiled frame) - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame) - org.apache.hadoop.conf.Configuration.getProps() @bci=43, line=1785 (Compiled frame) - org.apache.hadoop.conf.Configuration.get(java.lang.String) @bci=35, line=712 (Compiled frame) - org.apache.hadoop.conf.Configuration.getTrimmed(java.lang.String) @bci=2, line=731 (Compiled frame) - org.apache.hadoop.conf.Configuration.getBoolean(java.lang.String, boolean) @bci=2, line=1047 (Interpreted frame) - org.apache.hadoop.mapred.IFileInputStream.init(java.io.InputStream, long, org.apache.hadoop.conf.Configuration) @bci=111, line=93 (Interpreted frame) -
[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731239#comment-13731239 ] Mariappan Asokan commented on MAPREDUCE-1176: - BitsOfInfo, For each split, you need to compute how many bytes to skip(to account partial record that spans across previous and current splits.) Let us say we are processing split N(where N is a 0-based number) in the record reader, Z is the cumulative total of split sizes for splits from 0 thru N-1, L is the record length, and S is the number of bytes to skip at the beginning of split N. When N = 0, S = 0 and for all other N, S = L - (Z mod L) The record reader should account the last record in a split by reading additional bytes from next split if necessary. Hope I clarified the logic. -- Asokan Contribution: FixedLengthInputFormat and FixedLengthRecordReader Key: MAPREDUCE-1176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 0.20.1, 0.20.2 Environment: Any Reporter: BitsOfInfo Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch Hello, I would like to contribute the following two classes for incorporation into the mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. Provided are two classes first is the FixedLengthInputFormat and its corresponding FixedLengthRecordReader. When creating a job that specifies this input format, the job must have the mapreduce.input.fixedlengthinputformat.record.length property set as follows myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]); OR myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, [myFixedRecordLength]); This input format overrides computeSplitSize() in order to ensure that InputSplits do not contain any partial records since with fixed records there is no way to determine where a record begins if that were to occur. Each InputSplit passed to the FixedLengthRecordReader will start at the beginning of a record, and the last byte in the InputSplit will be the last byte of a record. The override of computeSplitSize() delegates to FileInputFormat's compute method, and then adjusts the returned split size by doing the following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength) This suite of fixed length input format classes, does not support compressed files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5311: -- Priority: Blocker (was: Major) Fix Version/s: 2.1.0-beta We need to take care of this for 2.1.0, making it a blocker. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5384) Races in DelegationTokenRenewal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-5384: Attachment: mr-5384-2.patch [~sseth], very good point. In my testing, I noticed the renewal RPC takes about 5 ms and could take longer. Uploading a patch that addresses this issue # Call token.renew() outside of a synchronized block # To address the potential race with cancel(), cancel() now returns a boolean - true if it successfully cancels all renewals and false if there is a renewal currently in progress. Renewing once after the cancel() is called is benign, but the user can be intimated about this renewal in progress. # TestDelegationTokenRenewal uses this intimation to allow for an extra renewal. Testing: Ran the new TestDelegationTokenRenewal in a loop 10 times and it passed. Races in DelegationTokenRenewal --- Key: MAPREDUCE-5384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.0, 1.1.2, 1.2.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-5384-0.patch, mr-5384-1.patch, mr-5384-2.patch There are a couple of races in DelegationTokenRenewal. One of them was addressed by MAPREDUCE-4860, which introduced a deadlock while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, since MAPREDUCE-4860 is already shipped in a release. Races to fix: # TimerTask#cancel() disallows future invocations of run(), but doesn't abort an already scheduled/started run(). # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only cancels that TimerTask instance. However, it has no effect on any other TimerTasks created for that token. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5384) Races in DelegationTokenRenewal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-5384: Status: Patch Available (was: Open) Races in DelegationTokenRenewal --- Key: MAPREDUCE-5384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1, 1.1.2, 1.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-5384-0.patch, mr-5384-1.patch, mr-5384-2.patch There are a couple of races in DelegationTokenRenewal. One of them was addressed by MAPREDUCE-4860, which introduced a deadlock while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, since MAPREDUCE-4860 is already shipped in a release. Races to fix: # TimerTask#cancel() disallows future invocations of run(), but doesn't abort an already scheduled/started run(). # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only cancels that TimerTask instance. However, it has no effect on any other TimerTasks created for that token. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5384) Races in DelegationTokenRenewal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731268#comment-13731268 ] Hadoop QA commented on MAPREDUCE-5384: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596421/mr-5384-2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3934//console This message is automatically generated. Races in DelegationTokenRenewal --- Key: MAPREDUCE-5384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.0, 1.1.2, 1.2.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-5384-0.patch, mr-5384-1.patch, mr-5384-2.patch There are a couple of races in DelegationTokenRenewal. One of them was addressed by MAPREDUCE-4860, which introduced a deadlock while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, since MAPREDUCE-4860 is already shipped in a release. Races to fix: # TimerTask#cancel() disallows future invocations of run(), but doesn't abort an already scheduled/started run(). # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only cancels that TimerTask instance. However, it has no effect on any other TimerTasks created for that token. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731444#comment-13731444 ] Arun C Murthy commented on MAPREDUCE-5311: -- This will break every single MR app. I'm -1 on this. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731445#comment-13731445 ] Arun C Murthy commented on MAPREDUCE-5311: -- To Sandy's point - several folks do use mapred.cluster.(map,reduce).memory.mb, so it will be a big BC break for them. Also, what does this mean for BC after we spent all the time fixing MAPREDUCE-5108. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731466#comment-13731466 ] Sandy Ryza commented on MAPREDUCE-5311: --- I am missing how the change would break every single MR app - we would leave the API intact. However, does this sound like a reasonable compromise: We keep SLOTS_MILLIS_MAPS, but instead of trying to map resources to slots (which will make even less sense when we introduce other resources), we compute it as the sum of the execution times for all map tasks. Essentially SLOTS_MILLIS_MAPS functions as CONTAINERS_MILLIS_MAPS would and is computed independently from container capabilities. This will break semantic compatibility in some cases for people using the mapred.cluster.(map,reduce).memory.mb properties, but it will restore semantic compatibility for people not using those properties (which I believe is a much larger number). It will also make the counter function in a more understandable way that is more in line with its description. Replace SLOTS_MILLIS counters with MEM_MILLIS - Key: MAPREDUCE-5311 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, MAPREDUCE-5311.patch Per discussion in MAPREDUCE-5310 and comments in the code we should remove all the related logic and just leave the counter constant for backwards compatibility and deprecate the counter constants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4661) Add HTTPS for WebUIs on Branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Weng updated MAPREDUCE-4661: Attachment: branch-1.2-patch.txt7 Found an error in JobHistory.java that breaks TestJobHistory and TestLostTracker. New patch is attached. Changes compared with previous patch: --- -Keys.TRACKER_NAME, Keys.HTTP_PORT, +Keys.TRACKER_NAME, Keys.HTTP_PORT, Keys.SHUFFLE_PORT, - * @return the taskLogsUrl. null if shuffle-port or tracker-name or + * @return the taskLogsUrl. null if http-port or tracker-name or --- TestFileCreation and TestNNThroughputBenchmark are passed on individual testcase run after cleaning up. The following two testcases are failed with and without the changes in the patch. [junit] Test org.apache.hadoop.io.compress.TestCodec FAILED [junit] Test org.apache.hadoop.fs.TestFsShellReturnCode FAILED Add HTTPS for WebUIs on Branch-1 Key: MAPREDUCE-4661 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4661 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security, webapps Affects Versions: 1.0.3 Reporter: Plamen Jeliazkov Assignee: Michael Weng Attachments: branch-1.2-patch.txt, branch-1.2-patch.txt2, branch-1.2-patch.txt3, branch-1.2-patch.txt4, branch-1.2-patch.txt5, branch-1.2-patch.txt6, branch-1.2-patch.txt7, MAPREDUCE-4461.patch, MAPREDUCE-4661.patch, MAPREDUCE-4661.patch, MAPREDUCE-4661.patch After investigating the methodology used to add HTTPS support in branch-2, I feel that this same approach should be back-ported to branch-1. I have taken many of the patches used for branch-2 and merged them in. I was working on top of HDP 1 at the time - I will provide a patch for trunk soon once I can confirm I am adding only the necessities for supporting HTTPS on the webUIs. As an added benefit -- this patch actually provides HTTPS webUI to HBase by extension. If you take a hadoop-core jar compiled with this patch and put it into the hbase/lib directory and apply the necessary configs to hbase/conf. = OLD IDEA(s) BEHIND ADDING HTTPS (look @ Sept 17th patch) == In order to provide full security around the cluster, the webUI should also be secure if desired to prevent cookie theft and user masquerading. Here is my proposed work. Currently I can only add HTTPS support. I do not know how to switch reliance of the HttpServer from HTTP to HTTPS fully. In order to facilitate this change I propose the following configuration additions: CONFIG PROPERTY - DEFAULT VALUE mapred.https.enable - false mapred.https.need.client.auth - false mapred.https.server.keystore.resource - ssl-server.xml mapred.job.tracker.https.port - 50035 mapred.job.tracker.https.address - IP_ADDR:50035 mapred.task.tracker.https.port - 50065 mapred.task.tracker.https.address - IP_ADDR:50065 I tested this on my local box after using keytool to generate a SSL certficate. You will need to change ssl-server.xml to point to the .keystore file after. Truststore may not be necessary; you can just point it to the keystore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5433) use mapreduce to parse hfiles and output keyvalue
[ https://issues.apache.org/jira/browse/MAPREDUCE-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731584#comment-13731584 ] rulinma commented on MAPREDUCE-5433: import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.io.hfile.CacheConfig; import org.apache.hadoop.hbase.io.hfile.HFile; import org.apache.hadoop.hbase.io.hfile.HFileScanner; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.JobContext; import org.apache.hadoop.mapreduce.RecordReader; import org.apache.hadoop.mapreduce.TaskAttemptContext; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.FileSplit; public class NHFileInputFormat extends FileInputFormatImmutableBytesWritable, Result { private class HFileRecordReader extends RecordReaderImmutableBytesWritable, Result { private HFile.Reader reader; private final HFileScanner scanner; private int entryNumber = 0; private ImmutableBytesWritable key = null; private Result result = null; ListKeyValue tmpList = new ArrayListKeyValue(); public HFileRecordReader(FileSplit split, Configuration conf) throws IOException { final Path path = split.getPath(); reader = HFile.createReader(FileSystem.get(conf), path, new CacheConfig(conf)); scanner = reader.getScanner(false, false, false); scanner.seekTo(); } @Override public void close() throws IOException { if (reader != null) { reader.close(); } } @Override public ImmutableBytesWritable getCurrentKey() throws IOException, InterruptedException { return key; } @Override public Result getCurrentValue() throws IOException, InterruptedException { result = new Result(tmpList); return result; } @Override public boolean nextKeyValue() throws IOException, InterruptedException { // clear tmpList.clear(); // first if (entryNumber == 0) { entryNumber++; } if (entryNumber reader.getEntries()) { return false; } key = new ImmutableBytesWritable(scanner.getKeyValue().getRow()); tmpList.add(scanner.getKeyValue()); while (scanner.next()) { entryNumber++; // step next replace compare if (compareBytes(key.get(), scanner.getKeyValue().getRow())) { // same row tmpList.add(scanner.getKeyValue()); } else { return true; } } // last track if (reader.getEntries() 0 (entryNumber == reader.getEntries())) { entryNumber++; return true; } return false; } @Override public float getProgress() throws IOException, InterruptedException { if (reader != null) { return (entryNumber / reader.getEntries()); } return 1; } @Override public void initialize(InputSplit arg0, TaskAttemptContext arg1) throws IOException, InterruptedException { // System.out.println(init); } } @Override protected boolean isSplitable(JobContext context, Path filename) { return false; } @Override public
[jira] [Commented] (MAPREDUCE-5433) use mapreduce to parse hfiles and output keyvalue
[ https://issues.apache.org/jira/browse/MAPREDUCE-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731586#comment-13731586 ] rulinma commented on MAPREDUCE-5433: parse file to table: import java.io.IOException; import java.net.URI; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.mapreduce.Counter; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.hbase.client.Result; public class HFileMapperTable { public static class MyMap extends MapperImmutableBytesWritable, Result, ImmutableBytesWritable, Put { public static Counter ct = null; public void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException { Put put = new Put(key.copyBytes()); ListKeyValue kvList = value.list(); for (KeyValue kv : kvList) { put.add(kv); } context.write(key, put); ct = context.getCounter(rowCount, totalRow); ct.increment(1); } public void setup(Context context) { } } public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); Job job = new Job(conf, HFileMapperTable2); job.setJarByClass(HFileMapperTableTwo.class); job.setMapperClass(MyMap.class); job.setInputFormatClass(HFileInputFormatTwo.class); FileSystem fs = FileSystem.get(URI.create(args[0]), conf); ListFileStatus result = new ArrayListFileStatus(); addInputPathRecursively(result, fs, new Path(args[0])); String inputPath = ; for (FileStatus f : result) { inputPath = f.getPath() + , + inputPath; } if (inputPath.length() 0) { inputPath = inputPath.substring(0, inputPath.length() - 1); } HFileInputFormatTwo.addInputPaths(job, inputPath); job.setMapOutputKeyClass(ImmutableBytesWritable.class); job.setMapOutputValueClass(Put.class); job.setNumReduceTasks(0); TableMapReduceUtil.initTableReducerJob(args[1], null, job); job.waitForCompletion(true); System.out.println(hfile parsed.); } public static void addInputPathRecursively(ListFileStatus result, FileSystem fs, Path path) throws IOException { for (FileStatus stat : fs.listStatus(path)) { if (stat.isDirectory()) { addInputPathRecursively(result, fs, stat.getPath()); } else { result.add(stat); } } } } use mapreduce to parse hfiles and output keyvalue - Key: MAPREDUCE-5433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5433 Project: Hadoop Map/Reduce Issue Type: Improvement Components: examples Reporter: rulinma Assignee: rulinma Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira