[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308760#comment-14308760 ] Masatake Iwasaki commented on MAPREDUCE-6223: - s/not local value but// TestJobConf#testNegativeValueForTaskVmem failures - Key: MAPREDUCE-6223 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Gera Shegalov Assignee: Varun Saxena Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch {code} Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec FAILURE! - in org.apache.hadoop.conf.TestJobConf testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf) Time elapsed: 0.089 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308545#comment-14308545 ] Hadoop QA commented on MAPREDUCE-6165: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685053/MAPREDUCE-6165-001.patch against trunk revision 6583ad1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.conf.TestJobConf Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5169//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5169//console This message is automatically generated. [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6227) DFSIO for truncate
[ https://issues.apache.org/jira/browse/MAPREDUCE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308606#comment-14308606 ] Hadoop QA commented on MAPREDUCE-6227: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696938/DFSIO-truncate-00.patch against trunk revision 6583ad1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.conf.TestJobConf Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5170//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5170//console This message is automatically generated. DFSIO for truncate -- Key: MAPREDUCE-6227 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6227 Project: Hadoop Map/Reduce Issue Type: New Feature Components: benchmarks, test Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: DFSIO-truncate-00.patch Create a benchmark and a test for truncate within the framework of TestDFSIO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated MAPREDUCE-6234: Attachment: MAPREDUCE-6234.002.patch .002 addresses my comment in MAPREDUCE-6223. Tests needing default value in conf can use {{MRJobConfig.DEFAULT_MAP_MEMORY_MB}} and test needing the value processed by JobConf#getMemoryRequired can use {{JobConf.DEFAULT_MAP_MEMORY_REQUIRED}}. MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch, MAPREDUCE-6234.002.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308750#comment-14308750 ] Tsuyoshi OZAWA commented on MAPREDUCE-6234: --- Make sense. [~jira.shegalov], do you know the reason that DEFAULT_MAP_MEMORY_MB is not updated in MAPREDUCE-5785? If there is no reason, I think we can apply this patch to trunk. MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308748#comment-14308748 ] Masatake Iwasaki commented on MAPREDUCE-6223: - I think JobConf#getMemoryReuiqred should get 1024 from not local value but constant in MRJobConfig other than DEFAULT_*_MEMORY_MB because 1024 is never set in Configuration. [~ajisakaa] / [~ozawa], please commit the patch of this issue first. I will update the patch of MAPREDUCE-6234 later. TestJobConf#testNegativeValueForTaskVmem failures - Key: MAPREDUCE-6223 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Gera Shegalov Assignee: Varun Saxena Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch {code} Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec FAILURE! - in org.apache.hadoop.conf.TestJobConf testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf) Time elapsed: 0.089 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6227) DFSIO for truncate
[ https://issues.apache.org/jira/browse/MAPREDUCE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated MAPREDUCE-6227: --- Attachment: DFSIO-truncate-01.patch Moved TestDFSIO_results.log under {{target/test-dir}} for tests. DFSIO for truncate -- Key: MAPREDUCE-6227 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6227 Project: Hadoop Map/Reduce Issue Type: New Feature Components: benchmarks, test Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: DFSIO-truncate-00.patch, DFSIO-truncate-01.patch Create a benchmark and a test for truncate within the framework of TestDFSIO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308764#comment-14308764 ] Gera Shegalov commented on MAPREDUCE-6234: -- I apologize, I am a little tied up right now to do a thorough review. Looking into resolving this is on my list. I was thinking that direct references to to DEFAULT_*_MEMORY_MB should be wrapped in a single method. Maybe [~kasha] can chime in in the meantime. MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6223) TestJobConf#testNegativeValueForTaskVmem failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308671#comment-14308671 ] Varun Saxena commented on MAPREDUCE-6223: - [~ajisakaa] / [~ozawa], As you wish. Because currently test failures are appearing till MAPREDUCE-6223 is committed. Ideally {{MRJobConfig.DEFAULT_MAP_MEMORY_MB}} should not be changed. I feel we should not be taking default value from a local variable. MAPREDUCE-6234 hence will be a redundant fix as we will have to revert its changes again. Although it is somewhat confusing that default value in mapred-default.xml is -1 and in code we take it as 1024. But if somebody reads the config description, which should be done, its quite clear what is the behavior of this config. {code} description The amount of memory to request from the scheduler for each map task. If this is not specified or is non-positive, it is inferred from mapreduce.map.java.opts and mapreduce.job.heap.memory-mb.ratio. If java-opts are also not specified, we set it to 1024. /description {code} You can take a call whether to commit that or not. Alternatively you can review and commit this as well. TestJobConf#testNegativeValueForTaskVmem failures - Key: MAPREDUCE-6223 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6223 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Gera Shegalov Assignee: Varun Saxena Attachments: MAPREDUCE-6223.001.patch, MAPREDUCE-6223.002.patch {code} Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.328 sec FAILURE! - in org.apache.hadoop.conf.TestJobConf testNegativeValueForTaskVmem(org.apache.hadoop.conf.TestJobConf) Time elapsed: 0.089 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem(TestJobConf.java:111) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6235) Bundle and compress files passed with -libjars prior to uploading and distributing
[ https://issues.apache.org/jira/browse/MAPREDUCE-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307179#comment-14307179 ] Dustin Cote commented on MAPREDUCE-6235: Thanks folks, I believe I was seeing a time difference because of the time to compress. I'll go ahead and close this out since no code change should be made here. Bundle and compress files passed with -libjars prior to uploading and distributing -- Key: MAPREDUCE-6235 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6235 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache, mrv2 Affects Versions: 2.6.0 Reporter: Dustin Cote Assignee: Dustin Cote Priority: Minor To improve performance, we should upload jars flagged by -libjars as a single bundle and expand on arrival instead of uploading the jars one by one. This would also reduce network overhead of using the -libjars option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6235) Bundle and compress files passed with -libjars prior to uploading and distributing
[ https://issues.apache.org/jira/browse/MAPREDUCE-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307181#comment-14307181 ] Dustin Cote commented on MAPREDUCE-6235: Time to *zip* not compress... ok now closing it. Bundle and compress files passed with -libjars prior to uploading and distributing -- Key: MAPREDUCE-6235 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6235 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache, mrv2 Affects Versions: 2.6.0 Reporter: Dustin Cote Assignee: Dustin Cote Priority: Minor To improve performance, we should upload jars flagged by -libjars as a single bundle and expand on arrival instead of uploading the jars one by one. This would also reduce network overhead of using the -libjars option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-6235) Bundle and compress files passed with -libjars prior to uploading and distributing
[ https://issues.apache.org/jira/browse/MAPREDUCE-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote resolved MAPREDUCE-6235. Resolution: Invalid Bundle and compress files passed with -libjars prior to uploading and distributing -- Key: MAPREDUCE-6235 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6235 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache, mrv2 Affects Versions: 2.6.0 Reporter: Dustin Cote Assignee: Dustin Cote Priority: Minor To improve performance, we should upload jars flagged by -libjars as a single bundle and expand on arrival instead of uploading the jars one by one. This would also reduce network overhead of using the -libjars option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6245) Fixed split shuffling.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307331#comment-14307331 ] Eric Payne commented on MAPREDUCE-6245: --- [~lbkzman], Can you please describe the problem that this Jira is trying to resolve? Fixed split shuffling. -- Key: MAPREDUCE-6245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6245 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: lbkzman Assignee: lbkzman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307434#comment-14307434 ] Allen Wittenauer commented on MAPREDUCE-6059: - It wasn't committed to branch-2 because I generally don't. Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307452#comment-14307452 ] Jason Lowe commented on MAPREDUCE-6059: --- If you have no objections, I'd like to commit this to branch-2 as well. I'd like to keep the trunk and branch-2 lines as reasonably close as we can to minimize the pain of maintaining the two lines. Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307471#comment-14307471 ] Allen Wittenauer commented on MAPREDUCE-6059: - No objection from me if you want to be Sisyphus. :) Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5847) Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5847: Status: Open (was: Patch Available) Cancelling patch as it no longer applies. Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask -- Key: MAPREDUCE-5847 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5847 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5847.v01.patch, MAPREDUCE-5847.v02.patch Both MapTask and ReduceTask carry redundant code to update BYTES_WRITTEN counter. However, {{Task.updateCounters}} uses file system stats for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-207: --- Status: Open (was: Patch Available) Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Gera Shegalov Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, MAPREDUCE-207.v07.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-207: --- Status: Patch Available (was: Open) Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Gera Shegalov Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, MAPREDUCE-207.v07.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-207) Computing Input Splits on the MR Cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308003#comment-14308003 ] Hadoop QA commented on MAPREDUCE-207: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655331/MAPREDUCE-207.v07.patch against trunk revision e1990ab. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5167//console This message is automatically generated. Computing Input Splits on the MR Cluster Key: MAPREDUCE-207 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster, mrv2 Reporter: Philip Zeyliger Assignee: Gera Shegalov Attachments: MAPREDUCE-207.patch, MAPREDUCE-207.v02.patch, MAPREDUCE-207.v03.patch, MAPREDUCE-207.v05.patch, MAPREDUCE-207.v06.patch, MAPREDUCE-207.v07.patch Instead of computing the input splits as part of job submission, Hadoop could have a separate job task type that computes the input splits, therefore allowing that computation to happen on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them
[ https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5044: Status: Open (was: Patch Available) Have AM trigger jstack on task attempts that timeout before killing them Key: MAPREDUCE-5044 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Gera Shegalov Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png When an AM expires a task attempt it would be nice if it triggered a jstack output via SIGQUIT before killing the task attempt. This would be invaluable for helping users debug their hung tasks, especially if they do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them
[ https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5044: Status: Patch Available (was: Open) Have AM trigger jstack on task attempts that timeout before killing them Key: MAPREDUCE-5044 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Gera Shegalov Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png When an AM expires a task attempt it would be nice if it triggered a jstack output via SIGQUIT before killing the task attempt. This would be invaluable for helping users debug their hung tasks, especially if they do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them
[ https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307791#comment-14307791 ] Hadoop QA commented on MAPREDUCE-5044: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645521/MAPREDUCE-5044.v06.patch against trunk revision c4980a2. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5164//console This message is automatically generated. Have AM trigger jstack on task attempts that timeout before killing them Key: MAPREDUCE-5044 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Gera Shegalov Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png When an AM expires a task attempt it would be nice if it triggered a jstack output via SIGQUIT before killing the task attempt. This would be invaluable for helping users debug their hung tasks, especially if they do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them
[ https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5044: Status: Open (was: Patch Available) Cancelling patch as it no longer applies. Have AM trigger jstack on task attempts that timeout before killing them Key: MAPREDUCE-5044 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Gera Shegalov Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, Screen Shot 2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png When an AM expires a task attempt it would be nice if it triggered a jstack output via SIGQUIT before killing the task attempt. This would be invaluable for helping users debug their hung tasks, especially if they do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6237) DBRecordReader is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307813#comment-14307813 ] Hadoop QA commented on MAPREDUCE-6237: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696811/mapreduce-6237.patch against trunk revision d27439f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1154 javac compiler warnings (more than the trunk's current 1149 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5162//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5162//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5162//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5162//console This message is automatically generated. DBRecordReader is not thread safe - Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6243) Fix findbugs warnings in hadoop-rumen
[ https://issues.apache.org/jira/browse/MAPREDUCE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307267#comment-14307267 ] Hudson commented on MAPREDUCE-6243: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2027 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2027/]) MAPREDUCE-6243. Fix findbugs warnings in hadoop-rumen. Contributed by Masatake Iwasaki. (aajisaka: rev 34fe11c987730932f99dec6eb458a22624eb075b) * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/RandomSeedGenerator.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/MapAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ParsedConfigFile.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Hadoop20JHParser.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ReduceAttempt20LineHistoryEventEmitter.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/HadoopLogsAnalyzer.java Fix findbugs warnings in hadoop-rumen - Key: MAPREDUCE-6243 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6243 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Masatake Iwasaki Priority: Minor Labels: newbie Fix For: 2.7.0 Attachments: MAPREDUCE-6243.001.patch, MAPREDUCE-6243.002.patch, findbugs.xml There are 7 findbugs warnings in hadoop-rumen modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6245) Fixed split shuffling.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lbkzman updated MAPREDUCE-6245: --- Status: Patch Available (was: Open) index 72b47f2..8b89782 100644 --- src/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java +++ src/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java @@ -203,12 +203,8 @@ public class InputSamplerK,V extends Configured implement s Tool { r.setSeed(seed); LOG.debug(seed: + seed); // shuffle splits - for (int i = 0; i splits.size(); ++i) { -InputSplit tmp = splits.get(i); -int j = r.nextInt(splits.size()); -splits.set(i, splits.get(j)); -splits.set(j, tmp); - } + Collections.shuffle(splits); + // our target rate is in terms of the maximum number of sample splits, // but we accept the possibility of sampling additional splits to hit // the target sample keyset Fixed split shuffling. -- Key: MAPREDUCE-6245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6245 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: lbkzman Assignee: lbkzman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6245) Fixed split shuffling.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lbkzman updated MAPREDUCE-6245: --- Status: Open (was: Patch Available) Fixed split shuffling. -- Key: MAPREDUCE-6245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6245 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: lbkzman Assignee: lbkzman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6245) Fixed split shuffling.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lbkzman updated MAPREDUCE-6245: --- Status: Open (was: Patch Available) Fixed split shuffling. -- Key: MAPREDUCE-6245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6245 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: lbkzman Assignee: lbkzman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5988) Fix dead links to the javadocs in mapreduce project
[ https://issues.apache.org/jira/browse/MAPREDUCE-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307270#comment-14307270 ] Hudson commented on MAPREDUCE-5988: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2027 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2027/]) MAPREDUCE-5988. Fix dead links to the javadocs in mapreduce project. (aajisaka) (aajisaka: rev cc6bbfceae1cddfae6a3892cb7e7104531a689be) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/package-info.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/package-info.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/api/protocolrecords/package-info.java * hadoop-mapreduce-project/CHANGES.txt Fix dead links to the javadocs in mapreduce project --- Key: MAPREDUCE-5988 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5988 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.1 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.7.0 Attachments: MAPREDUCE-5988.2.patch, MAPREDUCE-5988.patch In http://hadoop.apache.org/docs/r2.4.1/api/allclasses-frame.html, some classes are listed, but not documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307278#comment-14307278 ] Hudson commented on MAPREDUCE-6059: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2027 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2027/]) MAPREDUCE-6059. Speed up history server startup time (Siqi Li via aw) (aw: rev fd57ab2002f97dcc83d455a5e0c770c8efde77a4) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6245) Fixed split shuffling.
lbkzman created MAPREDUCE-6245: -- Summary: Fixed split shuffling. Key: MAPREDUCE-6245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6245 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: lbkzman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307309#comment-14307309 ] Hudson commented on MAPREDUCE-6059: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #92 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/92/]) MAPREDUCE-6059. Speed up history server startup time (Siqi Li via aw) (aw: rev fd57ab2002f97dcc83d455a5e0c770c8efde77a4) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5988) Fix dead links to the javadocs in mapreduce project
[ https://issues.apache.org/jira/browse/MAPREDUCE-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307301#comment-14307301 ] Hudson commented on MAPREDUCE-5988: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #92 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/92/]) MAPREDUCE-5988. Fix dead links to the javadocs in mapreduce project. (aajisaka) (aajisaka: rev cc6bbfceae1cddfae6a3892cb7e7104531a689be) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/api/protocolrecords/package-info.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/package-info.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/package-info.java Fix dead links to the javadocs in mapreduce project --- Key: MAPREDUCE-5988 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5988 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.1 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.7.0 Attachments: MAPREDUCE-5988.2.patch, MAPREDUCE-5988.patch In http://hadoop.apache.org/docs/r2.4.1/api/allclasses-frame.html, some classes are listed, but not documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6243) Fix findbugs warnings in hadoop-rumen
[ https://issues.apache.org/jira/browse/MAPREDUCE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307298#comment-14307298 ] Hudson commented on MAPREDUCE-6243: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #92 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/92/]) MAPREDUCE-6243. Fix findbugs warnings in hadoop-rumen. Contributed by Masatake Iwasaki. (aajisaka: rev 34fe11c987730932f99dec6eb458a22624eb075b) * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Hadoop20JHParser.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/RandomSeedGenerator.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ReduceAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/HadoopLogsAnalyzer.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ParsedConfigFile.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/MapAttempt20LineHistoryEventEmitter.java * hadoop-mapreduce-project/CHANGES.txt Fix findbugs warnings in hadoop-rumen - Key: MAPREDUCE-6243 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6243 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Masatake Iwasaki Priority: Minor Labels: newbie Fix For: 2.7.0 Attachments: MAPREDUCE-6243.001.patch, MAPREDUCE-6243.002.patch, findbugs.xml There are 7 findbugs warnings in hadoop-rumen modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6245) Fixed split shuffling.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lbkzman updated MAPREDUCE-6245: --- Affects Version/s: 2.6.0 Release Note: index 72b47f2..8b89782 100644 --- src/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java +++ src/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java @@ -203,12 +203,8 @@ public class InputSamplerK,V extends Configured implement s Tool { r.setSeed(seed); LOG.debug(seed: + seed); // shuffle splits - for (int i = 0; i splits.size(); ++i) { -InputSplit tmp = splits.get(i); -int j = r.nextInt(splits.size()); -splits.set(i, splits.get(j)); -splits.set(j, tmp); - } + Collections.shuffle(splits); + // our target rate is in terms of the maximum number of sample splits, // but we accept the possibility of sampling additional splits to hit // the target sample keyset Status: Patch Available (was: Open) Fixed split shuffling. -- Key: MAPREDUCE-6245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6245 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: lbkzman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6245) Fixed split shuffling.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lbkzman updated MAPREDUCE-6245: --- Assignee: lbkzman Target Version/s: 2.6.0 Status: Open (was: Patch Available) Fixed split shuffling. -- Key: MAPREDUCE-6245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6245 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: lbkzman Assignee: lbkzman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6245) Fixed split shuffling.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lbkzman updated MAPREDUCE-6245: --- Release Note: (was: index 72b47f2..8b89782 100644 --- src/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java +++ src/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java @@ -203,12 +203,8 @@ public class InputSamplerK,V extends Configured implement s Tool { r.setSeed(seed); LOG.debug(seed: + seed); // shuffle splits - for (int i = 0; i splits.size(); ++i) { -InputSplit tmp = splits.get(i); -int j = r.nextInt(splits.size()); -splits.set(i, splits.get(j)); -splits.set(j, tmp); - } + Collections.shuffle(splits); + // our target rate is in terms of the maximum number of sample splits, // but we accept the possibility of sampling additional splits to hit // the target sample keyset ) Fixed split shuffling. -- Key: MAPREDUCE-6245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6245 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: lbkzman Assignee: lbkzman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6245) Fixed split shuffling.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lbkzman updated MAPREDUCE-6245: --- Status: Patch Available (was: Open) Fixed split shuffling. -- Key: MAPREDUCE-6245 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6245 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: lbkzman Assignee: lbkzman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6243) Fix findbugs warnings in hadoop-rumen
[ https://issues.apache.org/jira/browse/MAPREDUCE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307344#comment-14307344 ] Hudson commented on MAPREDUCE-6243: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/96/]) MAPREDUCE-6243. Fix findbugs warnings in hadoop-rumen. Contributed by Masatake Iwasaki. (aajisaka: rev 34fe11c987730932f99dec6eb458a22624eb075b) * hadoop-mapreduce-project/CHANGES.txt * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ParsedConfigFile.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/MapAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Hadoop20JHParser.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ReduceAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/HadoopLogsAnalyzer.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/RandomSeedGenerator.java Fix findbugs warnings in hadoop-rumen - Key: MAPREDUCE-6243 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6243 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Masatake Iwasaki Priority: Minor Labels: newbie Fix For: 2.7.0 Attachments: MAPREDUCE-6243.001.patch, MAPREDUCE-6243.002.patch, findbugs.xml There are 7 findbugs warnings in hadoop-rumen modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5988) Fix dead links to the javadocs in mapreduce project
[ https://issues.apache.org/jira/browse/MAPREDUCE-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307347#comment-14307347 ] Hudson commented on MAPREDUCE-5988: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/96/]) MAPREDUCE-5988. Fix dead links to the javadocs in mapreduce project. (aajisaka) (aajisaka: rev cc6bbfceae1cddfae6a3892cb7e7104531a689be) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/api/protocolrecords/package-info.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/package-info.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/package-info.java Fix dead links to the javadocs in mapreduce project --- Key: MAPREDUCE-5988 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5988 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.1 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.7.0 Attachments: MAPREDUCE-5988.2.patch, MAPREDUCE-5988.patch In http://hadoop.apache.org/docs/r2.4.1/api/allclasses-frame.html, some classes are listed, but not documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307356#comment-14307356 ] Hudson commented on MAPREDUCE-6059: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/96/]) MAPREDUCE-6059. Speed up history server startup time (Siqi Li via aw) (aw: rev fd57ab2002f97dcc83d455a5e0c770c8efde77a4) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java * hadoop-mapreduce-project/CHANGES.txt Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5988) Fix dead links to the javadocs in mapreduce project
[ https://issues.apache.org/jira/browse/MAPREDUCE-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307388#comment-14307388 ] Hudson commented on MAPREDUCE-5988: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2046 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2046/]) MAPREDUCE-5988. Fix dead links to the javadocs in mapreduce project. (aajisaka) (aajisaka: rev cc6bbfceae1cddfae6a3892cb7e7104531a689be) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/package-info.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/package-info.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/api/protocolrecords/package-info.java Fix dead links to the javadocs in mapreduce project --- Key: MAPREDUCE-5988 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5988 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.1 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.7.0 Attachments: MAPREDUCE-5988.2.patch, MAPREDUCE-5988.patch In http://hadoop.apache.org/docs/r2.4.1/api/allclasses-frame.html, some classes are listed, but not documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307397#comment-14307397 ] Hudson commented on MAPREDUCE-6059: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2046 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2046/]) MAPREDUCE-6059. Speed up history server startup time (Siqi Li via aw) (aw: rev fd57ab2002f97dcc83d455a5e0c770c8efde77a4) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java * hadoop-mapreduce-project/CHANGES.txt Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6243) Fix findbugs warnings in hadoop-rumen
[ https://issues.apache.org/jira/browse/MAPREDUCE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307385#comment-14307385 ] Hudson commented on MAPREDUCE-6243: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2046 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2046/]) MAPREDUCE-6243. Fix findbugs warnings in hadoop-rumen. Contributed by Masatake Iwasaki. (aajisaka: rev 34fe11c987730932f99dec6eb458a22624eb075b) * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ReduceAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/RandomSeedGenerator.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/HadoopLogsAnalyzer.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Hadoop20JHParser.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/MapAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ParsedConfigFile.java * hadoop-mapreduce-project/CHANGES.txt Fix findbugs warnings in hadoop-rumen - Key: MAPREDUCE-6243 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6243 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Masatake Iwasaki Priority: Minor Labels: newbie Fix For: 2.7.0 Attachments: MAPREDUCE-6243.001.patch, MAPREDUCE-6243.002.patch, findbugs.xml There are 7 findbugs warnings in hadoop-rumen modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307293#comment-14307293 ] Jason Lowe commented on MAPREDUCE-6059: --- Any reason this should not be committed to branch-2? Most patches are committed there, so I'm curious about the criteria wrt. this patch. Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308060#comment-14308060 ] Jian Fang commented on MAPREDUCE-6242: -- Thanks for your quick fix. Progress report log is incredibly excessive in application master - Key: MAPREDUCE-6242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.4.0 Reporter: Jian Fang Assignee: Varun Saxena Attachments: MAPREDUCE-6242.001.patch We saw incredibly excessive logs in application master for a long running one with many task attempts. The log write rate is around 1MB/sec in some cases. Most of the log entries were from the progress report such as the following ones. 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.15605757 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.4108217 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_02_0 is : 0.06634143 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.6506 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_01_0 is : 0.21723115 Looks like the report interval is controlled by a hard-coded variable PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We should allow users to set the appropriate progress interval for their applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6233) org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308094#comment-14308094 ] Robert Kanter commented on MAPREDUCE-6233: -- +1 org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk --- Key: MAPREDUCE-6233 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6233 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Yongjun Zhang Assignee: zhihai xu Attachments: MAPREDUCE-6233.000.patch https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2039/ {code} Stack Trace: java.lang.AssertionError: Large sort failed for 128 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort(TestLargeSort.java:61) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5847) Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308116#comment-14308116 ] Allen Wittenauer commented on MAPREDUCE-5847: - Incompatible changes can go into trunk. Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask -- Key: MAPREDUCE-5847 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5847 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5847.v01.patch, MAPREDUCE-5847.v02.patch Both MapTask and ReduceTask carry redundant code to update BYTES_WRITTEN counter. However, {{Task.updateCounters}} uses file system stats for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5847) Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308129#comment-14308129 ] Jason Lowe commented on MAPREDUCE-5847: --- bq. Incompatible changes can go into trunk. Understood, but I'm arguing we shouldn't break incompatibility without sufficient merit. Each incompatibility instance is a hurdle someone needs to jump to move from Hadoop 2.x to Hadoop 3.x. Hence I'm wondering if others feel this is worth adding another hurdle or not. Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask -- Key: MAPREDUCE-5847 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5847 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5847.v01.patch, MAPREDUCE-5847.v02.patch Both MapTask and ReduceTask carry redundant code to update BYTES_WRITTEN counter. However, {{Task.updateCounters}} uses file system stats for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5847) Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308140#comment-14308140 ] Allen Wittenauer commented on MAPREDUCE-5847: - This seems like such a low risk and, as it is today, aren't we actually reporting wrong information? That's significantly worse! (I know of one vendor that is actually mentions that they report correct values for some metrics since we blow it so badly in lots of places...) While I understand the concerns about moving from 2.x to 3.x, users should expect some degree of pain when moving major versions. Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask -- Key: MAPREDUCE-5847 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5847 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5847.v01.patch, MAPREDUCE-5847.v02.patch Both MapTask and ReduceTask carry redundant code to update BYTES_WRITTEN counter. However, {{Task.updateCounters}} uses file system stats for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6233) org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6233: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Zhihai. Committed to trunk and branch-2! org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk --- Key: MAPREDUCE-6233 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6233 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Yongjun Zhang Assignee: zhihai xu Fix For: 2.7.0 Attachments: MAPREDUCE-6233.000.patch https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2039/ {code} Stack Trace: java.lang.AssertionError: Large sort failed for 128 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort(TestLargeSort.java:61) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5847) Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308171#comment-14308171 ] Jason Lowe commented on MAPREDUCE-5847: --- If the counters are wrong then that's a separate JIRA that I think would be very well worth fixing in 2.x. However IIUC this isn't about fixing incorrect counter values, rather it's about removing counters. I can see the value of storing the separate counters, since they are not exactly equivalent. One of them records the amount of bytes written to the filesystem overall during the life of the task, while the other records the amount of data written to the filesystem during the output collector's write method. For many jobs these will be the same values, however if the task was doing out-of-band I/O with the filesystems outside of the output collector write method then they will not be equivalent. Comparing these counters could be used to audit tasks that aren't writing data through the normal framework channels. Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask -- Key: MAPREDUCE-5847 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5847 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5847.v01.patch, MAPREDUCE-5847.v02.patch Both MapTask and ReduceTask carry redundant code to update BYTES_WRITTEN counter. However, {{Task.updateCounters}} uses file system stats for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar updated MAPREDUCE-5965: Attachment: MAPREDUCE-5965.1.patch Reattaching updated patch. Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Arup Malakar Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more Container killed by
[jira] [Commented] (MAPREDUCE-5847) Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308114#comment-14308114 ] Jason Lowe commented on MAPREDUCE-5847: --- Looks like the patch still applies, but I'm not sure this should go in per the incompatibility concerns I raised earlier. I don't think the benefits of this change are worth that cost, even if this just goes into trunk. Thoughts? Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask -- Key: MAPREDUCE-5847 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5847 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5847.v01.patch, MAPREDUCE-5847.v02.patch Both MapTask and ReduceTask carry redundant code to update BYTES_WRITTEN counter. However, {{Task.updateCounters}} uses file system stats for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6233) org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308159#comment-14308159 ] Hudson commented on MAPREDUCE-6233: --- FAILURE: Integrated in Hadoop-trunk-Commit #7028 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7028/]) MAPREDUCE-6233. org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk (zxu via rkanter) (rkanter: rev e2ee2ff7d7ca429487d7e3883daedffbb269ebd4) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk --- Key: MAPREDUCE-6233 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6233 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Yongjun Zhang Assignee: zhihai xu Fix For: 2.7.0 Attachments: MAPREDUCE-6233.000.patch https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2039/ {code} Stack Trace: java.lang.AssertionError: Large sort failed for 128 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort(TestLargeSort.java:61) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar updated MAPREDUCE-5965: Status: Patch Available (was: Open) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Arup Malakar Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more Container killed by the ApplicationMaster.
[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308092#comment-14308092 ] Hadoop QA commented on MAPREDUCE-5965: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696883/MAPREDUCE-5965.1.patch against trunk revision e1990ab. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5168//console This message is automatically generated. Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Arup Malakar Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at
[jira] [Commented] (MAPREDUCE-5847) Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308132#comment-14308132 ] Gera Shegalov commented on MAPREDUCE-5847: -- Agreed, let us close it 'Won't fix' Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask -- Key: MAPREDUCE-5847 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5847 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5847.v01.patch, MAPREDUCE-5847.v02.patch Both MapTask and ReduceTask carry redundant code to update BYTES_WRITTEN counter. However, {{Task.updateCounters}} uses file system stats for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-5847) Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov resolved MAPREDUCE-5847. -- Resolution: Won't Fix Remove redundant code for fileOutputByteCounter in MapTask and ReduceTask -- Key: MAPREDUCE-5847 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5847 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5847.v01.patch, MAPREDUCE-5847.v02.patch Both MapTask and ReduceTask carry redundant code to update BYTES_WRITTEN counter. However, {{Task.updateCounters}} uses file system stats for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6233) org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308152#comment-14308152 ] zhihai xu commented on MAPREDUCE-6233: -- thanks [~rkanter] for the review and commit. org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk --- Key: MAPREDUCE-6233 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6233 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Yongjun Zhang Assignee: zhihai xu Fix For: 2.7.0 Attachments: MAPREDUCE-6233.000.patch https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2039/ {code} Stack Trace: java.lang.AssertionError: Large sort failed for 128 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort(TestLargeSort.java:61) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6244) Hadoop examples when run without an argument, gives ERROR instead of just usage info
[ https://issues.apache.org/jira/browse/MAPREDUCE-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308237#comment-14308237 ] Akira AJISAKA commented on MAPREDUCE-6244: -- bq. We should inspect all job to make their behavior consistent. My thought is that it's enough to print usages when the number of given arguments are wrong. I'm okay with just printing usages. Consistency is more important. Hadoop examples when run without an argument, gives ERROR instead of just usage info Key: MAPREDUCE-6244 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6244 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.0, trunk-win, 2.6.0 Reporter: Robert Justice Assignee: Abhishek Kapoor Priority: Minor Attachments: HADOOP-8834.patch, HADOOP-8834.patch Hadoop sort example should not give an ERROR and only should display usage when run with no parameters. {code} $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar sort ERROR: Wrong number of parameters: 0 instead of 2. sort [-m maps] [-r reduces] [-inFormat input format class] [-outFormat output format class] [-outKey output key class] [-outValue output value class] [-totalOrder pcnt num samples max splits] input output Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6233) org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308205#comment-14308205 ] Yongjun Zhang commented on MAPREDUCE-6233: -- Thanks [~zxu] and [~rkanter]! org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort failed in trunk --- Key: MAPREDUCE-6233 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6233 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Yongjun Zhang Assignee: zhihai xu Fix For: 2.7.0 Attachments: MAPREDUCE-6233.000.patch https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2039/ {code} Stack Trace: java.lang.AssertionError: Large sort failed for 128 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.mapreduce.TestLargeSort.testLargeSort(TestLargeSort.java:61) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name
[ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2293: Status: Patch Available (was: Open) Enhance MultipleOutputs to allow additional characters in the named output name --- Key: MAPREDUCE-2293 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: David Rosenstrauch Assignee: Harsh J Priority: Minor Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff, mapreduce.mo.removecheck.r5.diff Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class. This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too. (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension. Perhaps '-' and a '_' characters as well.) The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method. (Though I don't know if there's any downstream impact by loosening this restriction.) Would be extremely helpful/useful to have this fixed though! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name
[ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2293: Status: Open (was: Patch Available) Cancelling, as patch no longer applies. Enhance MultipleOutputs to allow additional characters in the named output name --- Key: MAPREDUCE-2293 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: David Rosenstrauch Assignee: Harsh J Priority: Minor Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff, mapreduce.mo.removecheck.r5.diff Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class. This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too. (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension. Perhaps '-' and a '_' characters as well.) The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method. (Though I don't know if there's any downstream impact by loosening this restriction.) Would be extremely helpful/useful to have this fixed though! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-1554) If user name contains '_', then searching of jobs based on user name on job history web UI doesn't work
[ https://issues.apache.org/jira/browse/MAPREDUCE-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-1554: Resolution: Won't Fix Status: Resolved (was: Patch Available) Only applies to a dead version of Hadoop. Closing as won't fix. If user name contains '_', then searching of jobs based on user name on job history web UI doesn't work --- Key: MAPREDUCE-1554 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1554 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ravi Gummadi Assignee: Devaraj K Fix For: 0.22.1 Attachments: MAPREDUCE-1554-0.22.patch, MAPREDUCE-1554.patch If user name contains underscore as part of it, then searching of jobs based on user name on job history web UI doesn't work. This is because in code, everywhere {code}split(_){code} is done on history file name to get user name. And other parts of history file name also should *not* be obtained by using split(_). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name
[ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2293: Status: Open (was: Patch Available) Enhance MultipleOutputs to allow additional characters in the named output name --- Key: MAPREDUCE-2293 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: David Rosenstrauch Assignee: Harsh J Priority: Minor Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff, mapreduce.mo.removecheck.r5.diff Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class. This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too. (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension. Perhaps '-' and a '_' characters as well.) The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method. (Though I don't know if there's any downstream impact by loosening this restriction.) Would be extremely helpful/useful to have this fixed though! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name
[ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307608#comment-14307608 ] Hadoop QA commented on MAPREDUCE-2293: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555681/mapreduce.mo.removecheck.r5.diff against trunk revision afbecbb. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5157//console This message is automatically generated. Enhance MultipleOutputs to allow additional characters in the named output name --- Key: MAPREDUCE-2293 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: David Rosenstrauch Assignee: Harsh J Priority: Minor Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff, mapreduce.mo.removecheck.r5.diff Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class. This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too. (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension. Perhaps '-' and a '_' characters as well.) The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method. (Though I don't know if there's any downstream impact by loosening this restriction.) Would be extremely helpful/useful to have this fixed though! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-6059: -- Fix Version/s: (was: 3.0.0) 2.7.0 Hadoop Flags: Reviewed Thanks Siqi and Allen! I committed this to branch-2 as well. Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.7.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-6165: - Status: Patch Available (was: Open) Resubmitting. [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-6165: - Status: Open (was: Patch Available) [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5657) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments
[ https://issues.apache.org/jira/browse/MAPREDUCE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308457#comment-14308457 ] Andrew Purtell commented on MAPREDUCE-5657: --- Go for it! [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments - Key: MAPREDUCE-5657 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5657 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Attachments: 5657-branch-2.patch, 5657-branch-2.patch, 5657-trunk.patch, 5657-trunk.patch Javadoc is more strict by default in JDK8 and will error out on malformed or illegal tags found in doc comments. Although tagged as JDK8 all of the required changes are generic Javadoc cleanups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5657) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments
[ https://issues.apache.org/jira/browse/MAPREDUCE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308393#comment-14308393 ] Akira AJISAKA commented on MAPREDUCE-5657: -- Hi [~apurtell], how is this issue going? If you don't have time to rebase your patch, I'd like to succeed your work. [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments - Key: MAPREDUCE-5657 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5657 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Attachments: 5657-branch-2.patch, 5657-branch-2.patch, 5657-trunk.patch, 5657-trunk.patch Javadoc is more strict by default in JDK8 and will error out on malformed or illegal tags found in doc comments. Although tagged as JDK8 all of the required changes are generic Javadoc cleanups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308424#comment-14308424 ] Tsuyoshi OZAWA commented on MAPREDUCE-6234: --- [~iwasakims] thank you for taking this JIRA. Currently, -1 for this fix. I think we should fix test side to follow default configuration since the assertion only check whether check if the high ram properties are not set. {code} // check if the high ram properties are not set assertEquals(expectedMapMB, simulatedConf.getLong(MRJobConfig.MAP_MEMORY_MB, MRJobConfig.DEFAULT_MAP_MEMORY_MB)); assertEquals(expectedReduceMB, simulatedConf.getLong(MRJobConfig.REDUCE_MEMORY_MB, MRJobConfig.DEFAULT_MAP_MEMORY_MB)); {code} We should also rethink what we should test in TestHighRamJob - it refers JT_MAX_MAPMEMORY_MB or some old configurations. Do we really need this tests? MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308430#comment-14308430 ] Tsuyoshi OZAWA commented on MAPREDUCE-6234: --- s/whether check if the high ram properties are not set./whether the high ram properties are set./ MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5657) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments
[ https://issues.apache.org/jira/browse/MAPREDUCE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308460#comment-14308460 ] Akira AJISAKA commented on MAPREDUCE-5657: -- Thank you Andrew! [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments - Key: MAPREDUCE-5657 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5657 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Attachments: 5657-branch-2.patch, 5657-branch-2.patch, 5657-trunk.patch, 5657-trunk.patch Javadoc is more strict by default in JDK8 and will error out on malformed or illegal tags found in doc comments. Although tagged as JDK8 all of the required changes are generic Javadoc cleanups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-5657) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments
[ https://issues.apache.org/jira/browse/MAPREDUCE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned MAPREDUCE-5657: Assignee: Akira AJISAKA (was: Andrew Purtell) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments - Key: MAPREDUCE-5657 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5657 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Andrew Purtell Assignee: Akira AJISAKA Priority: Minor Attachments: 5657-branch-2.patch, 5657-branch-2.patch, 5657-trunk.patch, 5657-trunk.patch Javadoc is more strict by default in JDK8 and will error out on malformed or illegal tags found in doc comments. Although tagged as JDK8 all of the required changes are generic Javadoc cleanups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5988) Fix dead links to the javadocs in mapreduce project
[ https://issues.apache.org/jira/browse/MAPREDUCE-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307020#comment-14307020 ] Hudson commented on MAPREDUCE-5988: --- FAILURE: Integrated in Hadoop-Yarn-trunk #829 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/829/]) MAPREDUCE-5988. Fix dead links to the javadocs in mapreduce project. (aajisaka) (aajisaka: rev cc6bbfceae1cddfae6a3892cb7e7104531a689be) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/package-info.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/package-info.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/api/protocolrecords/package-info.java Fix dead links to the javadocs in mapreduce project --- Key: MAPREDUCE-5988 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5988 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.1 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.7.0 Attachments: MAPREDUCE-5988.2.patch, MAPREDUCE-5988.patch In http://hadoop.apache.org/docs/r2.4.1/api/allclasses-frame.html, some classes are listed, but not documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6243) Fix findbugs warnings in hadoop-rumen
[ https://issues.apache.org/jira/browse/MAPREDUCE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307017#comment-14307017 ] Hudson commented on MAPREDUCE-6243: --- FAILURE: Integrated in Hadoop-Yarn-trunk #829 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/829/]) MAPREDUCE-6243. Fix findbugs warnings in hadoop-rumen. Contributed by Masatake Iwasaki. (aajisaka: rev 34fe11c987730932f99dec6eb458a22624eb075b) * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/MapAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ReduceAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Hadoop20JHParser.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/RandomSeedGenerator.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/HadoopLogsAnalyzer.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ParsedConfigFile.java Fix findbugs warnings in hadoop-rumen - Key: MAPREDUCE-6243 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6243 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Masatake Iwasaki Priority: Minor Labels: newbie Fix For: 2.7.0 Attachments: MAPREDUCE-6243.001.patch, MAPREDUCE-6243.002.patch, findbugs.xml There are 7 findbugs warnings in hadoop-rumen modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307027#comment-14307027 ] Hudson commented on MAPREDUCE-6059: --- FAILURE: Integrated in Hadoop-Yarn-trunk #829 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/829/]) MAPREDUCE-6059. Speed up history server startup time (Siqi Li via aw) (aw: rev fd57ab2002f97dcc83d455a5e0c770c8efde77a4) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java * hadoop-mapreduce-project/CHANGES.txt Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307081#comment-14307081 ] Varun Saxena commented on MAPREDUCE-6242: - Oh its urgent. Thanks for letting me know. Will fix this on priority and upload a patch today. Progress report log is incredibly excessive in application master - Key: MAPREDUCE-6242 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.4.0 Reporter: Jian Fang Assignee: Varun Saxena We saw incredibly excessive logs in application master for a long running one with many task attempts. The log write rate is around 1MB/sec in some cases. Most of the log entries were from the progress report such as the following ones. 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.15605757 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.4108217 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_02_0 is : 0.06634143 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_00_0 is : 0.6506 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1422985365246_0001_m_01_0 is : 0.21723115 Looks like the report interval is controlled by a hard-coded variable PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We should allow users to set the appropriate progress interval for their applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5988) Fix dead links to the javadocs in mapreduce project
[ https://issues.apache.org/jira/browse/MAPREDUCE-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306993#comment-14306993 ] Hudson commented on MAPREDUCE-5988: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #95 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/95/]) MAPREDUCE-5988. Fix dead links to the javadocs in mapreduce project. (aajisaka) (aajisaka: rev cc6bbfceae1cddfae6a3892cb7e7104531a689be) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/api/protocolrecords/package-info.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/package-info.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/package-info.java Fix dead links to the javadocs in mapreduce project --- Key: MAPREDUCE-5988 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5988 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.1 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.7.0 Attachments: MAPREDUCE-5988.2.patch, MAPREDUCE-5988.patch In http://hadoop.apache.org/docs/r2.4.1/api/allclasses-frame.html, some classes are listed, but not documented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6243) Fix findbugs warnings in hadoop-rumen
[ https://issues.apache.org/jira/browse/MAPREDUCE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306990#comment-14306990 ] Hudson commented on MAPREDUCE-6243: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #95 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/95/]) MAPREDUCE-6243. Fix findbugs warnings in hadoop-rumen. Contributed by Masatake Iwasaki. (aajisaka: rev 34fe11c987730932f99dec6eb458a22624eb075b) * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/MapAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ParsedConfigFile.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/RandomSeedGenerator.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Hadoop20JHParser.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/ReduceAttempt20LineHistoryEventEmitter.java * hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/HadoopLogsAnalyzer.java Fix findbugs warnings in hadoop-rumen - Key: MAPREDUCE-6243 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6243 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Masatake Iwasaki Priority: Minor Labels: newbie Fix For: 2.7.0 Attachments: MAPREDUCE-6243.001.patch, MAPREDUCE-6243.002.patch, findbugs.xml There are 7 findbugs warnings in hadoop-rumen modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6059) Speed up history server startup time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307000#comment-14307000 ] Hudson commented on MAPREDUCE-6059: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #95 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/95/]) MAPREDUCE-6059. Speed up history server startup time (Siqi Li via aw) (aw: rev fd57ab2002f97dcc83d455a5e0c770c8efde77a4) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java * hadoop-mapreduce-project/CHANGES.txt Speed up history server startup time Key: MAPREDUCE-6059 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6059 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Siqi Li Assignee: Siqi Li Fix For: 3.0.0 Attachments: YARN-2366.v1.patch When history server starts up, It scans every history directories and put all history files into a cache, whereas this cache only stores 20K recent history files. Therefore, it is wasting a large portion of time loading old history files into the cache, and the startup time will keep increasing if we don't trim the number of history files. For example, when history server starts up with 2.5M history files in HDFS, it took ~5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6227) DFSIO for truncate
[ https://issues.apache.org/jira/browse/MAPREDUCE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated MAPREDUCE-6227: --- Attachment: DFSIO-truncate-00.patch Adding truncate to DFSIO. DFSIO for truncate -- Key: MAPREDUCE-6227 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6227 Project: Hadoop Map/Reduce Issue Type: New Feature Components: benchmarks, test Reporter: Konstantin Shvachko Attachments: DFSIO-truncate-00.patch Create a benchmark and a test for truncate within the framework of TestDFSIO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6234) MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308487#comment-14308487 ] Masatake Iwasaki commented on MAPREDUCE-6234: - I agree that the code of gridmix should be updated. But I think it is very confusing that MRJobConfig.DEFAULT_MAP_MEMORY_MB is not same with the value of mapreduce.map.memory.mb in mapred-default.xml and it should be fixed without regarding to the test failure of gridmix. MRJobConfig.DEFAULT_*_MEMORY_MB should be consistent with mapred-default.xml Key: MAPREDUCE-6234 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6234 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix, mrv2 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Attachments: MAPREDUCE-6234.001.patch TestHighRamJob fails by this. {code} --- T E S T S --- Running org.apache.hadoop.mapred.gridmix.TestHighRamJob Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.162 sec FAILURE! - in org.apache.hadoop.mapred.gridmix.TestHighRamJob testHighRamFeatureEmulation(org.apache.hadoop.mapred.gridmix.TestHighRamJob) Time elapsed: 1.102 sec FAILURE! java.lang.AssertionError: expected:1024 but was:-1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamConfig(TestHighRamJob.java:98) at org.apache.hadoop.mapred.gridmix.TestHighRamJob.testHighRamFeatureEmulation(TestHighRamJob.java:117) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6227) DFSIO for truncate
[ https://issues.apache.org/jira/browse/MAPREDUCE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated MAPREDUCE-6227: --- Assignee: Konstantin Shvachko Affects Version/s: 2.7.0 Status: Patch Available (was: Open) DFSIO for truncate -- Key: MAPREDUCE-6227 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6227 Project: Hadoop Map/Reduce Issue Type: New Feature Components: benchmarks, test Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: DFSIO-truncate-00.patch Create a benchmark and a test for truncate within the framework of TestDFSIO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308509#comment-14308509 ] Mohammad Kamrul Islam commented on MAPREDUCE-6240: -- [~jira.shegalov] if this message Please check your configuration for mapreduce.framework.name and the correspond server addresses. is shown, please include what is the current values of those properties. It will help users to find out if their configurations is effective. Hadoop client displays confusing error message -- Key: MAPREDUCE-6240 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, MAPREDUCE-6240.1.patch Hadoop client often throws exception with java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. This is a misleading and generic message for any cluster initialization problem. It takes a lot of debugging hours to identify the root cause. The correct error message could resolve this problem quickly. In one such instance, Oozie log showed the following exception while the root cause was CNF that Hadoop client didn't return in the exception. {noformat} JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:281) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) ... 10 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5492) Suppress expected log output stated on MAPREDUCE-5
[ https://issues.apache.org/jira/browse/MAPREDUCE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5492: Status: Open (was: Patch Available) Suppress expected log output stated on MAPREDUCE-5 -- Key: MAPREDUCE-5492 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5492 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.2 Reporter: Harsh J Assignee: bc Wong Priority: Trivial Attachments: 0001-MAPREDUCE-5492.-Do-not-reuse-a-committed-ServletResp.patch, mr-5492-2.patch Jetty in MR1 may produce an expected EOFException during its operation that we shouldn't log out in ERROR form. This shouldn't affect MR2, however, as it uses Netty. See MAPREDUCE-5 (Jothi's comments) for more info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-5492) Suppress expected log output stated on MAPREDUCE-5
[ https://issues.apache.org/jira/browse/MAPREDUCE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved MAPREDUCE-5492. - Resolution: Won't Fix Closing at Won't Fix since this is no longer an issue 2.x and up. Suppress expected log output stated on MAPREDUCE-5 -- Key: MAPREDUCE-5492 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5492 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.2 Reporter: Harsh J Assignee: bc Wong Priority: Trivial Attachments: 0001-MAPREDUCE-5492.-Do-not-reuse-a-committed-ServletResp.patch, mr-5492-2.patch Jetty in MR1 may produce an expected EOFException during its operation that we shouldn't log out in ERROR form. This shouldn't affect MR2, however, as it uses Netty. See MAPREDUCE-5 (Jothi's comments) for more info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5696) Add Localization counters to MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5696: Status: Patch Available (was: Open) Add Localization counters to MR --- Key: MAPREDUCE-5696 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5696 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: LocalizationCounters.png, MAPREDUCE-5696.v01.patch, MAPREDUCE-5696.v02.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of user-visible metrics. The purpose of this JIRA is to compliment YARN-1529. While YARN-1529 attempts to provide a cluster-wide view to cluster admins, this JIRA focuses on exposing the localization overhead on per-job basis to the job owner/user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5696) Add Localization counters to MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5696: Status: Open (was: Patch Available) Add Localization counters to MR --- Key: MAPREDUCE-5696 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5696 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: LocalizationCounters.png, MAPREDUCE-5696.v01.patch, MAPREDUCE-5696.v02.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of user-visible metrics. The purpose of this JIRA is to compliment YARN-1529. While YARN-1529 attempts to provide a cluster-wide view to cluster admins, this JIRA focuses on exposing the localization overhead on per-job basis to the job owner/user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5648) Allow user-specified diagnostics for killed tasks and jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5648: Status: Open (was: Patch Available) Allow user-specified diagnostics for killed tasks and jobs -- Key: MAPREDUCE-5648 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5648 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mr-am, mrv2 Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5648.v01.patch, MAPREDUCE-5648.v02.patch, MAPREDUCE-5648.v03.patch, MAPREDUCE-5648.v04.patch, MAPREDUCE-5648.v05.patch, Screen Shot 2013-11-23 at 11.12.15 AM.png Our users and tools want to be able to supply additional custom diagnostic messages to mapreduce ClientProtocol killTask. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5648) Allow user-specified diagnostics for killed tasks and jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307712#comment-14307712 ] Hadoop QA commented on MAPREDUCE-5648: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634947/MAPREDUCE-5648.v05.patch against trunk revision b6466de. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5158//console This message is automatically generated. Allow user-specified diagnostics for killed tasks and jobs -- Key: MAPREDUCE-5648 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5648 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mr-am, mrv2 Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5648.v01.patch, MAPREDUCE-5648.v02.patch, MAPREDUCE-5648.v03.patch, MAPREDUCE-5648.v04.patch, MAPREDUCE-5648.v05.patch, Screen Shot 2013-11-23 at 11.12.15 AM.png Our users and tools want to be able to supply additional custom diagnostic messages to mapreduce ClientProtocol killTask. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5839) Provide a boolean switch to enable LazyOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5839: Status: Patch Available (was: Open) Provide a boolean switch to enable LazyOutputFormat --- Key: MAPREDUCE-5839 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5839 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5839.v01.patch, MAPREDUCE-5839.v02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5839) Provide a boolean switch to enable LazyOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5839: Status: Open (was: Patch Available) Provide a boolean switch to enable LazyOutputFormat --- Key: MAPREDUCE-5839 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5839 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5839.v01.patch, MAPREDUCE-5839.v02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5696) Add Localization counters to MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5696: Status: Open (was: Patch Available) Cancelling patch as it no longer applies. Add Localization counters to MR --- Key: MAPREDUCE-5696 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5696 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: LocalizationCounters.png, MAPREDUCE-5696.v01.patch, MAPREDUCE-5696.v02.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of user-visible metrics. The purpose of this JIRA is to compliment YARN-1529. While YARN-1529 attempts to provide a cluster-wide view to cluster admins, this JIRA focuses on exposing the localization overhead on per-job basis to the job owner/user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5839) Provide a boolean switch to enable LazyOutputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307729#comment-14307729 ] Hadoop QA commented on MAPREDUCE-5839: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640581/MAPREDUCE-5839.v02.patch against trunk revision b6466de. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5160//console This message is automatically generated. Provide a boolean switch to enable LazyOutputFormat --- Key: MAPREDUCE-5839 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5839 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5839.v01.patch, MAPREDUCE-5839.v02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) DBRecordReader is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kannan Rajah updated MAPREDUCE-6237: Attachment: mapreduce-6237.patch DBRecordReader is not thread safe - Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6237) DBRecordReader is not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kannan Rajah updated MAPREDUCE-6237: Attachment: (was: mapreduce-6237.patch) DBRecordReader is not thread safe - Key: MAPREDUCE-6237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6237 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0 Reporter: Kannan Rajah Assignee: Kannan Rajah Attachments: mapreduce-6237.patch, mapreduce-6237.patch, mapreduce-6237.patch DBInputFormat.createDBRecorder is reusing JDBC connections across instances of DBRecordReader. This is not a good idea. We should be creating separate connection. If performance is a concern, then we should be using connection pooling instead. I looked at DBOutputFormat.getRecordReader. It actually creates a new Connection object for each DBRecordReader. So can we just change DBInputFormat to create new Connection every time? The connection reuse code was added as part of connection leak bug in MAPREDUCE-1443. Any reason for caching the connection? We observed this issue in a customer setup where they were reading data from MySQL using Pig. As per customer, the query is returning two records which causes Pig to create two instances of DBRecordReader. These two instances are sharing the database connection instance. The first DBRecordReader runs to extract the first record from MySQL just fine, but then closes the shared connection instance. When the second DBRecordReader runs, it tries to execute a query to retrieve the second record on the closed shared connection instance, which fail. If we set mapred.map.tasks to 1, the query will be successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)