[jira] [Updated] (MAPREDUCE-6176) To limit the map task number or reduce task number of an application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao updated MAPREDUCE-6176: Attachment: (was: MAPREDUCE-6176.patch) To limit the map task number or reduce task number of an application Key: MAPREDUCE-6176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6176 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, mrv2 Affects Versions: 2.5.0, 2.4.1, 2.5.1, 2.5.2 Reporter: Yang Hao Assignee: Yang Hao Labels: patch Fix For: 2.4.1 As MapReduce is a batch framework of calculation, so people may want to run application A as well as application B 、C, and a limit resource be put on A. A good way to do so is that we can limit the number of application's map task or reduce task. If we set mapreduce.map.num.max as M, then the map task number will not exceed M. At the same time, if we set mapreduce.map.num.max as R, then the reduce task number will not exceed R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6176) To limit the map task number or reduce task number of an application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao updated MAPREDUCE-6176: Attachment: MAPREDUCE-6176.patch add test To limit the map task number or reduce task number of an application Key: MAPREDUCE-6176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6176 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, mrv2 Affects Versions: 2.5.0, 2.4.1, 2.5.1, 2.5.2 Reporter: Yang Hao Assignee: Yang Hao Labels: patch Fix For: 2.4.1 Attachments: MAPREDUCE-6176.patch As MapReduce is a batch framework of calculation, so people may want to run application A as well as application B 、C, and a limit resource be put on A. A good way to do so is that we can limit the number of application's map task or reduce task. If we set mapreduce.map.num.max as M, then the map task number will not exceed M. At the same time, if we set mapreduce.map.num.max as R, then the reduce task number will not exceed R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6176) To limit the map task number or reduce task number of an application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao updated MAPREDUCE-6176: Fix Version/s: (was: 2.4.1) 2.5.0 To limit the map task number or reduce task number of an application Key: MAPREDUCE-6176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6176 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, mrv2 Affects Versions: 2.5.0, 2.4.1, 2.5.1, 2.5.2 Reporter: Yang Hao Assignee: Yang Hao Labels: patch Fix For: 2.5.0 Attachments: MAPREDUCE-6176.patch As MapReduce is a batch framework of calculation, so people may want to run application A as well as application B 、C, and a limit resource be put on A. A good way to do so is that we can limit the number of application's map task or reduce task. If we set mapreduce.map.num.max as M, then the map task number will not exceed M. At the same time, if we set mapreduce.map.num.max as R, then the reduce task number will not exceed R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts in some environments
[ https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229655#comment-14229655 ] Akira AJISAKA commented on MAPREDUCE-6092: -- Fixed in MAPREDUCE-6104. Closing. TestJobHistoryParsing#testPartialJob timeouts in some environments -- Key: MAPREDUCE-6092 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6092 Project: Hadoop Map/Reduce Issue Type: Test Components: test Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-6092.patch Rebasing the patch in MAPREDUCE-5392, I found TestJobHistoryParsing#testPartialJob timeout in my environments. {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829) {code} We should extend the timeout not to fail the test in slow machines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts in some environments
[ https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-6092: - Resolution: Duplicate Target Version/s: (was: 2.7.0) Status: Resolved (was: Patch Available) TestJobHistoryParsing#testPartialJob timeouts in some environments -- Key: MAPREDUCE-6092 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6092 Project: Hadoop Map/Reduce Issue Type: Test Components: test Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-6092.patch Rebasing the patch in MAPREDUCE-5392, I found TestJobHistoryParsing#testPartialJob timeout in my environments. {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829) {code} We should extend the timeout not to fail the test in slow machines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-6168) Old MR client is still broken when receiving new counters from MR job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved MAPREDUCE-6168. --- Resolution: Won't Fix Thanks [~zjshen] and [~kasha] for comments! Looks like we already made agreement here, so resolve this ticket as won't fix. Old MR client is still broken when receiving new counters from MR job - Key: MAPREDUCE-6168 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6168 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen Assignee: Junping Du Priority: Blocker In the following scenarios: 1. Either insecure or secure; 2. MR 2.2 with new shuffle on NM; 3. Submitting via old client. We will see the following console exception: {code} 14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed successfully java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES at java.lang.Enum.valueOf(Enum.java:236) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182) at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370) at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that we haven't cover all the problematic code path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6176) To limit the map task number or reduce task number of an application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao updated MAPREDUCE-6176: Fix Version/s: (was: 2.5.0) To limit the map task number or reduce task number of an application Key: MAPREDUCE-6176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6176 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, mrv2 Affects Versions: 2.5.0, 2.4.1, 2.5.1, 2.5.2 Reporter: Yang Hao Assignee: Yang Hao Labels: patch Attachments: MAPREDUCE-6176.patch As MapReduce is a batch framework of calculation, so people may want to run application A as well as application B 、C, and a limit resource be put on A. A good way to do so is that we can limit the number of application's map task or reduce task. If we set mapreduce.map.num.max as M, then the map task number will not exceed M. At the same time, if we set mapreduce.map.num.max as R, then the reduce task number will not exceed R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned MAPREDUCE-6165: Assignee: Akira AJISAKA [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-6165: - Attachment: MAPREDUCE-6165-reproduce.patch [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229944#comment-14229944 ] Akira AJISAKA commented on MAPREDUCE-6165: -- The test unintentionally relies the iteration order of {{HashMap.entrySet()}}. Therefore the test fails in JDK8. {{MAPREDUCE-6165-reproduce.patch}} can reproduce the failure on JDK7. [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230407#comment-14230407 ] Eric Payne commented on MAPREDUCE-6166: --- [~jira.shegalov], I'm sorry for not being clear. {quote} Can you clarify where in the code it's required to keep the original checksum? Then this contents are written out using {{LocalFileSystem}}, which will create again an on-disk checksum because it's based on {{ChecksumFileSystem}}. {quote} I don't think the {{IFile}} format is related to {{ChecksumFileSystem}}. The {{IFile}} checksum is expected to be the last 4 bytes of the {{IFile}}, and if we use {{input.read}} as below, those 4 bytes of checksum are not copied into {{buf}}: {code} input = new IFileInputStream(input, compressedLength, conf); // Copy data to local-disk long bytesLeft = compressedLength - ((IFileInputStream)input).getSize(); try { final int BYTES_TO_READ = 64 * 1024; byte[] buf = new byte[BYTES_TO_READ]; while (bytesLeft 0) { int n = input.read(buf, 0, (int) Math.min(bytesLeft, BYTES_TO_READ)); ... {code} However, if we use {{readWithChecksum}} as below, the checksum is copied into {{buf}}: {code} input = new IFileInputStream(input, compressedLength, conf); // Copy data to local-disk long bytesLeft = compressedLength; try { final int BYTES_TO_READ = 64 * 1024; byte[] buf = new byte[BYTES_TO_READ]; while (bytesLeft 0) { int n = ((IFileInputStream)input).read(buf, 0, (int) Math.min(bytesLeft, BYTES_TO_READ)); ... {code} Without those last 4 bytes of checksum on the end of the {{IFile}} format, the final read will fail during the last merge pass with a chedksum error. Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk --- Key: MAPREDUCE-6166 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: MAPREDUCE-6166.v1.201411221941.txt, MAPREDUCE-6166.v2.201411251627.txt In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate map partition output gets corrupted on disk on the map side. If this corrupted map output is too large to shuffle in memory, the reducer streams it to disk without validating the checksum. In jobs this large, it could take hours before the reducer finally tries to read the corrupted file and fails. Since retries of the failed reduce attempt will also take hours, this delay in discovering the failure is multiplied greatly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230426#comment-14230426 ] Eric Payne commented on MAPREDUCE-6166: --- I'm sorry. The second code snippet should have been this: {code} input = new IFileInputStream(input, compressedLength, conf); // Copy data to local-disk long bytesLeft = compressedLength; try { final int BYTES_TO_READ = 64 * 1024; byte[] buf = new byte[BYTES_TO_READ]; while (bytesLeft 0) { int n = ((IFileInputStream)input).readWithChecksum(buf, 0, (int) Math.min(bytesLeft, BYTES_TO_READ)); ... {code} Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk --- Key: MAPREDUCE-6166 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: MAPREDUCE-6166.v1.201411221941.txt, MAPREDUCE-6166.v2.201411251627.txt In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate map partition output gets corrupted on disk on the map side. If this corrupted map output is too large to shuffle in memory, the reducer streams it to disk without validating the checksum. In jobs this large, it could take hours before the reducer finally tries to read the corrupted file and fails. Since retries of the failed reduce attempt will also take hours, this delay in discovering the failure is multiplied greatly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6172) TestDbClasses timeouts are too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230508#comment-14230508 ] Jason Lowe commented on MAPREDUCE-6172: --- +1 lgtm. Committing this. TestDbClasses timeouts are too aggressive - Key: MAPREDUCE-6172 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6172 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: MAPREDUCE-6172.patch Some of the TestDbClasses test timeouts are only 1 second, and some of those tests perform disk I/O which could easily exceed the test timeout if the disk is busy or there's some other hiccup on the system at the time. We should increase these timeouts to something more reasonable (i.e.: 10 or 20 seconds). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6172) TestDbClasses timeouts are too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-6172: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Varun! I committed this to trunk and branch-2. TestDbClasses timeouts are too aggressive - Key: MAPREDUCE-6172 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6172 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: MAPREDUCE-6172.patch Some of the TestDbClasses test timeouts are only 1 second, and some of those tests perform disk I/O which could easily exceed the test timeout if the disk is busy or there's some other hiccup on the system at the time. We should increase these timeouts to something more reasonable (i.e.: 10 or 20 seconds). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6172) TestDbClasses timeouts are too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230535#comment-14230535 ] Hudson commented on MAPREDUCE-6172: --- FAILURE: Integrated in Hadoop-trunk-Commit #6618 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6618/]) MAPREDUCE-6172. TestDbClasses timeouts are too aggressive. Contributed by Varun Saxena (jlowe: rev 2b30fb1053e70c128b98013fb63cf9a095623be6) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java TestDbClasses timeouts are too aggressive - Key: MAPREDUCE-6172 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6172 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: MAPREDUCE-6172.patch Some of the TestDbClasses test timeouts are only 1 second, and some of those tests perform disk I/O which could easily exceed the test timeout if the disk is busy or there's some other hiccup on the system at the time. We should increase these timeouts to something more reasonable (i.e.: 10 or 20 seconds). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230582#comment-14230582 ] Jason Lowe commented on MAPREDUCE-6160: --- +1 lgtm. Committing this. Potential NullPointerException in MRClientProtocol interface implementation. Key: MAPREDUCE-6160 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Rohith Assignee: Rohith Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch In the implementation of MRClientProtocol, many methods can throw NullPointerExceptions. Instead of NullPointerExceptions, better to throw IOException with proper message. In the HistoryClientService class and MRClientService class has #verifyAndGetJob() method that return job object as null. {code} getTaskReport(GetTaskReportRequest request) throws IOException; getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException; getCounters(GetCountersRequest request) throws IOException; getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) throws IOException; getTaskReports(GetTaskReportsRequest request) throws IOException; getDiagnostics(GetDiagnosticsRequest request) throws IOException; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-6160: -- Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Rohith! I committed this to trunk and branch-2. Potential NullPointerException in MRClientProtocol interface implementation. Key: MAPREDUCE-6160 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Rohith Assignee: Rohith Fix For: 2.7.0 Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch In the implementation of MRClientProtocol, many methods can throw NullPointerExceptions. Instead of NullPointerExceptions, better to throw IOException with proper message. In the HistoryClientService class and MRClientService class has #verifyAndGetJob() method that return job object as null. {code} getTaskReport(GetTaskReportRequest request) throws IOException; getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException; getCounters(GetCountersRequest request) throws IOException; getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) throws IOException; getTaskReports(GetTaskReportsRequest request) throws IOException; getDiagnostics(GetDiagnosticsRequest request) throws IOException; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230627#comment-14230627 ] Hudson commented on MAPREDUCE-6160: --- SUCCESS: Integrated in Hadoop-trunk-Commit #6620 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6620/]) MAPREDUCE-6160. Potential NullPointerException in MRClientProtocol interface implementation. Contributed by Rohith (jlowe: rev 0c588904f8b68cad219d2bd8e33081d5cae656e5) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryServer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/client/MRClientService.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryClientService.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRClientService.java Potential NullPointerException in MRClientProtocol interface implementation. Key: MAPREDUCE-6160 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Rohith Assignee: Rohith Fix For: 2.7.0 Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch In the implementation of MRClientProtocol, many methods can throw NullPointerExceptions. Instead of NullPointerExceptions, better to throw IOException with proper message. In the HistoryClientService class and MRClientService class has #verifyAndGetJob() method that return job object as null. {code} getTaskReport(GetTaskReportRequest request) throws IOException; getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException; getCounters(GetCountersRequest request) throws IOException; getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) throws IOException; getTaskReports(GetTaskReportsRequest request) throws IOException; getDiagnostics(GetDiagnosticsRequest request) throws IOException; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-6171) The visibilities of the distributed cache files and archives should be determined by both their permissions and if they are located in HDFS encryption zone
[ https://issues.apache.org/jira/browse/MAPREDUCE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-6171. Resolution: Duplicate Fix Version/s: 2.7.0 Duping this to HADOOP-11341 since Dian reports that it fixes this issue. Thanks again Dian/Arun for finding and working on this. The visibilities of the distributed cache files and archives should be determined by both their permissions and if they are located in HDFS encryption zone --- Key: MAPREDUCE-6171 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6171 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Dian Fu Fix For: 2.7.0 The visibilities of the distributed cache files and archives are currently determined by the permission of these files or archives. The following is the logic of method isPublic() in class ClientDistributedCacheManager: {code} static boolean isPublic(Configuration conf, URI uri, MapURI, FileStatus statCache) throws IOException { FileSystem fs = FileSystem.get(uri, conf); Path current = new Path(uri.getPath()); //the leaf level file should be readable by others if (!checkPermissionOfOther(fs, current, FsAction.READ, statCache)) { return false; } return ancestorsHaveExecutePermissions(fs, current.getParent(), statCache); } {code} At NodeManager side, it will use yarn user to download public files and use the user who submits the job to download private files. In normal cases, there is no problem with this. However, if the files are located in an encryption zone(HDFS-6134) and yarn user are configured to be disallowed to fetch the DataEncryptionKey(DEK) of this encryption zone by KMS, the download process of this file will fail. You can reproduce this issue with the following steps (assume you submit job with user testUser): # create a clean cluster which has HDFS cryptographic FileSystem feature # create directory /data/ in HDFS and make it as an encryption zone with keyName testKey # configure KMS to only allow user testUser can decrypt DEK of key testKey in KMS {code} property namekey.acl.testKey.DECRYPT_EEK/name valuetestUser/value /property {code} # execute job teragen with user testUser: {code} su -s /bin/bash testUser -c hadoop jar hadoop-mapreduce-examples*.jar teragen 1 /data/terasort-input {code} # execute job terasort with user testUser: {code} su -s /bin/bash testUser -c hadoop jar hadoop-mapreduce-examples*.jar terasort /data/terasort-input /data/terasort-output {code} You will see logs like this at the job submitter's console: {code} INFO mapreduce.Job: Job job_1416860917658_0002 failed with state FAILED due to: Application application_1416860917658_0002 failed 2 times due to AM Container for appattempt_1416860917658_0002_02 exited with exitCode: -1000 due to: org.apache.hadoop.security.authorize.AuthorizationException: User [yarn] is not authorized to perform [DECRYPT_EEK] on key with ACL name [testKey]!! {code} The initial idea to solve this issue is to modify the logic in ClientDistributedCacheManager.isPublic to consider also whether this file is in an encryption zone. If it is in an encryption zone, this file should be considered as private. Then at NodeManager side, it will use user who submits the job to fetch the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)