[jira] [Updated] (MAPREDUCE-6176) To limit the map task number or reduce task number of an application

2014-12-01 Thread Yang Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Hao updated MAPREDUCE-6176:

Attachment: (was: MAPREDUCE-6176.patch)

 To limit the map task number or reduce task number of an application
 

 Key: MAPREDUCE-6176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, mrv2
Affects Versions: 2.5.0, 2.4.1, 2.5.1, 2.5.2
Reporter: Yang Hao
Assignee: Yang Hao
  Labels: patch
 Fix For: 2.4.1


 As MapReduce is a batch framework of calculation, so people may want to run 
 application A as well as application B 、C, and a limit resource be put on A. 
 A good way to do so is that we can limit the number of application's map task 
 or reduce task. If we set mapreduce.map.num.max as M, then the map task 
 number will not exceed M. At the same time, if we set mapreduce.map.num.max 
 as R, then the reduce task number will not exceed R



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6176) To limit the map task number or reduce task number of an application

2014-12-01 Thread Yang Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Hao updated MAPREDUCE-6176:

Attachment: MAPREDUCE-6176.patch

add test

 To limit the map task number or reduce task number of an application
 

 Key: MAPREDUCE-6176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, mrv2
Affects Versions: 2.5.0, 2.4.1, 2.5.1, 2.5.2
Reporter: Yang Hao
Assignee: Yang Hao
  Labels: patch
 Fix For: 2.4.1

 Attachments: MAPREDUCE-6176.patch


 As MapReduce is a batch framework of calculation, so people may want to run 
 application A as well as application B 、C, and a limit resource be put on A. 
 A good way to do so is that we can limit the number of application's map task 
 or reduce task. If we set mapreduce.map.num.max as M, then the map task 
 number will not exceed M. At the same time, if we set mapreduce.map.num.max 
 as R, then the reduce task number will not exceed R



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6176) To limit the map task number or reduce task number of an application

2014-12-01 Thread Yang Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Hao updated MAPREDUCE-6176:

Fix Version/s: (was: 2.4.1)
   2.5.0

 To limit the map task number or reduce task number of an application
 

 Key: MAPREDUCE-6176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, mrv2
Affects Versions: 2.5.0, 2.4.1, 2.5.1, 2.5.2
Reporter: Yang Hao
Assignee: Yang Hao
  Labels: patch
 Fix For: 2.5.0

 Attachments: MAPREDUCE-6176.patch


 As MapReduce is a batch framework of calculation, so people may want to run 
 application A as well as application B 、C, and a limit resource be put on A. 
 A good way to do so is that we can limit the number of application's map task 
 or reduce task. If we set mapreduce.map.num.max as M, then the map task 
 number will not exceed M. At the same time, if we set mapreduce.map.num.max 
 as R, then the reduce task number will not exceed R



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts in some environments

2014-12-01 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229655#comment-14229655
 ] 

Akira AJISAKA commented on MAPREDUCE-6092:
--

Fixed in MAPREDUCE-6104. Closing.

 TestJobHistoryParsing#testPartialJob timeouts in some environments
 --

 Key: MAPREDUCE-6092
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6092
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-6092.patch


 Rebasing the patch in MAPREDUCE-5392, I found 
 TestJobHistoryParsing#testPartialJob timeout in my environments.
 {code}
 Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing
 Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec 
  FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing
 testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing)  Time 
 elapsed: 0.987 sec   ERROR!
 java.lang.Exception: test timed out after 1000 milliseconds
 at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
 at 
 org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown 
 Source)
 at 
 org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
  Source)
 at 
 org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
 Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
 at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
 at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
 at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334)
 at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393)
 at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346)
 at 
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263)
 at org.apache.hadoop.conf.Configuration.get(Configuration.java:868)
 at 
 org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887)
 at 
 org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288)
 at 
 org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70)
 at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247)
 at 
 org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235)
 at 
 org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761)
 at 
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746)
 at 
 org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619)
 at 
 org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829)
 {code}
 We should extend the timeout not to fail the test in slow machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts in some environments

2014-12-01 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated MAPREDUCE-6092:
-
  Resolution: Duplicate
Target Version/s:   (was: 2.7.0)
  Status: Resolved  (was: Patch Available)

 TestJobHistoryParsing#testPartialJob timeouts in some environments
 --

 Key: MAPREDUCE-6092
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6092
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-6092.patch


 Rebasing the patch in MAPREDUCE-5392, I found 
 TestJobHistoryParsing#testPartialJob timeout in my environments.
 {code}
 Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing
 Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec 
  FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing
 testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing)  Time 
 elapsed: 0.987 sec   ERROR!
 java.lang.Exception: test timed out after 1000 milliseconds
 at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
 at 
 org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown 
 Source)
 at 
 org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
  Source)
 at 
 org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
 Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
 at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
 at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
 at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334)
 at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393)
 at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346)
 at 
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263)
 at org.apache.hadoop.conf.Configuration.get(Configuration.java:868)
 at 
 org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887)
 at 
 org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288)
 at 
 org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70)
 at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247)
 at 
 org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235)
 at 
 org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761)
 at 
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746)
 at 
 org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619)
 at 
 org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829)
 {code}
 We should extend the timeout not to fail the test in slow machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-6168) Old MR client is still broken when receiving new counters from MR job

2014-12-01 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved MAPREDUCE-6168.
---
Resolution: Won't Fix

Thanks [~zjshen] and [~kasha] for comments! Looks like we already made 
agreement here, so resolve this ticket as won't fix.

 Old MR client is still broken when receiving new counters from MR job
 -

 Key: MAPREDUCE-6168
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6168
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Junping Du
Priority: Blocker

 In the following scenarios:
 1. Either insecure or secure;
 2. MR 2.2 with new shuffle on NM;
 3. Submitting via old client.
 We will see the following console exception:
 {code}
 14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed 
 successfully
 java.lang.IllegalArgumentException: No enum constant 
 org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES
 at java.lang.Enum.valueOf(Enum.java:236)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
 at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
 at 
 org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
 at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that 
 we haven't cover all the problematic code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6176) To limit the map task number or reduce task number of an application

2014-12-01 Thread Yang Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Hao updated MAPREDUCE-6176:

Fix Version/s: (was: 2.5.0)

 To limit the map task number or reduce task number of an application
 

 Key: MAPREDUCE-6176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, mrv2
Affects Versions: 2.5.0, 2.4.1, 2.5.1, 2.5.2
Reporter: Yang Hao
Assignee: Yang Hao
  Labels: patch
 Attachments: MAPREDUCE-6176.patch


 As MapReduce is a batch framework of calculation, so people may want to run 
 application A as well as application B 、C, and a limit resource be put on A. 
 A good way to do so is that we can limit the number of application's map task 
 or reduce task. If we set mapreduce.map.num.max as M, then the map task 
 number will not exceed M. At the same time, if we set mapreduce.map.num.max 
 as R, then the reduce task number will not exceed R



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8

2014-12-01 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned MAPREDUCE-6165:


Assignee: Akira AJISAKA

 [JDK8] TestCombineFileInputFormat failed on JDK8
 

 Key: MAPREDUCE-6165
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: MAPREDUCE-6165-reproduce.patch


 The error msg:
 {noformat}
 testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 2.487 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911)
 testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 0.985 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8

2014-12-01 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated MAPREDUCE-6165:
-
Attachment: MAPREDUCE-6165-reproduce.patch

 [JDK8] TestCombineFileInputFormat failed on JDK8
 

 Key: MAPREDUCE-6165
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: MAPREDUCE-6165-reproduce.patch


 The error msg:
 {noformat}
 testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 2.487 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911)
 testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 0.985 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8

2014-12-01 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229944#comment-14229944
 ] 

Akira AJISAKA commented on MAPREDUCE-6165:
--

The test unintentionally relies the iteration order of {{HashMap.entrySet()}}. 
Therefore the test fails in JDK8. {{MAPREDUCE-6165-reproduce.patch}} can 
reproduce the failure on JDK7.

 [JDK8] TestCombineFileInputFormat failed on JDK8
 

 Key: MAPREDUCE-6165
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: MAPREDUCE-6165-reproduce.patch


 The error msg:
 {noformat}
 testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 2.487 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911)
 testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 0.985 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230407#comment-14230407
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

[~jira.shegalov], I'm sorry for not being clear.
{quote}
Can you clarify where in the code it's required to keep the original checksum?
Then this contents are written out using {{LocalFileSystem}}, which will create 
again an on-disk checksum because it's based on {{ChecksumFileSystem}}.
{quote}
I don't think the {{IFile}} format is related to {{ChecksumFileSystem}}.

The {{IFile}} checksum is expected to be the last 4 bytes of the {{IFile}}, and 
if we use {{input.read}} as below, those 4 bytes of checksum are not copied 
into {{buf}}:

{code}
input = new IFileInputStream(input, compressedLength, conf);
// Copy data to local-disk
long bytesLeft = compressedLength - ((IFileInputStream)input).getSize();
try {
  final int BYTES_TO_READ = 64 * 1024;
  byte[] buf = new byte[BYTES_TO_READ];
  while (bytesLeft  0) {
int n = input.read(buf, 0, (int) Math.min(bytesLeft, BYTES_TO_READ));
...
{code}

However, if we use {{readWithChecksum}} as below, the checksum is copied into 
{{buf}}:
{code}
input = new IFileInputStream(input, compressedLength, conf);
// Copy data to local-disk
long bytesLeft = compressedLength;
try {
  final int BYTES_TO_READ = 64 * 1024;
  byte[] buf = new byte[BYTES_TO_READ];
  while (bytesLeft  0) {
int n = ((IFileInputStream)input).read(buf, 0, (int) 
Math.min(bytesLeft, BYTES_TO_READ));
...
{code}

Without those last 4 bytes of checksum on the end of the {{IFile}} format, the 
final read will fail during the last merge pass with a chedksum error.

 Reducers do not catch bad map output transfers during shuffle if data 
 shuffled directly to disk
 ---

 Key: MAPREDUCE-6166
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
 MAPREDUCE-6166.v2.201411251627.txt


 In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
 map partition output gets corrupted on disk on the map side. If this 
 corrupted map output is too large to shuffle in memory, the reducer streams 
 it to disk without validating the checksum. In jobs this large, it could take 
 hours before the reducer finally tries to read the corrupted file and fails. 
 Since retries of the failed reduce attempt will also take hours, this delay 
 in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230426#comment-14230426
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

I'm sorry. The second code snippet should have been this:
{code}
input = new IFileInputStream(input, compressedLength, conf);
// Copy data to local-disk
long bytesLeft = compressedLength;
try {
  final int BYTES_TO_READ = 64 * 1024;
  byte[] buf = new byte[BYTES_TO_READ];
  while (bytesLeft  0) {
int n = ((IFileInputStream)input).readWithChecksum(buf, 0, (int) 
Math.min(bytesLeft, BYTES_TO_READ));
...
{code}

 Reducers do not catch bad map output transfers during shuffle if data 
 shuffled directly to disk
 ---

 Key: MAPREDUCE-6166
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
 MAPREDUCE-6166.v2.201411251627.txt


 In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
 map partition output gets corrupted on disk on the map side. If this 
 corrupted map output is too large to shuffle in memory, the reducer streams 
 it to disk without validating the checksum. In jobs this large, it could take 
 hours before the reducer finally tries to read the corrupted file and fails. 
 Since retries of the failed reduce attempt will also take hours, this delay 
 in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6172) TestDbClasses timeouts are too aggressive

2014-12-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230508#comment-14230508
 ] 

Jason Lowe commented on MAPREDUCE-6172:
---

+1 lgtm.  Committing this.

 TestDbClasses timeouts are too aggressive
 -

 Key: MAPREDUCE-6172
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6172
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6172.patch


 Some of the TestDbClasses test timeouts are only 1 second, and some of those 
 tests perform disk I/O which could easily exceed the test timeout if the disk 
 is busy or there's some other hiccup on the system at the time.  We should 
 increase these timeouts to something more reasonable (i.e.: 10 or 20 seconds).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6172) TestDbClasses timeouts are too aggressive

2014-12-01 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6172:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks, Varun!  I committed this to trunk and branch-2.

 TestDbClasses timeouts are too aggressive
 -

 Key: MAPREDUCE-6172
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6172
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6172.patch


 Some of the TestDbClasses test timeouts are only 1 second, and some of those 
 tests perform disk I/O which could easily exceed the test timeout if the disk 
 is busy or there's some other hiccup on the system at the time.  We should 
 increase these timeouts to something more reasonable (i.e.: 10 or 20 seconds).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6172) TestDbClasses timeouts are too aggressive

2014-12-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230535#comment-14230535
 ] 

Hudson commented on MAPREDUCE-6172:
---

FAILURE: Integrated in Hadoop-trunk-Commit #6618 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6618/])
MAPREDUCE-6172. TestDbClasses timeouts are too aggressive. Contributed by Varun 
Saxena (jlowe: rev 2b30fb1053e70c128b98013fb63cf9a095623be6)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java


 TestDbClasses timeouts are too aggressive
 -

 Key: MAPREDUCE-6172
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6172
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6172.patch


 Some of the TestDbClasses test timeouts are only 1 second, and some of those 
 tests perform disk I/O which could easily exceed the test timeout if the disk 
 is busy or there's some other hiccup on the system at the time.  We should 
 increase these timeouts to something more reasonable (i.e.: 10 or 20 seconds).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.

2014-12-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230582#comment-14230582
 ] 

Jason Lowe commented on MAPREDUCE-6160:
---

+1 lgtm.  Committing this.

 Potential NullPointerException in MRClientProtocol interface implementation.
 

 Key: MAPREDUCE-6160
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, 
 MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch


 In the implementation of MRClientProtocol, many methods can throw 
 NullPointerExceptions. Instead of NullPointerExceptions, better to throw 
 IOException with proper message.
 In the HistoryClientService class and MRClientService class has 
 #verifyAndGetJob() method that return job object as null.
 {code}
 getTaskReport(GetTaskReportRequest request) throws IOException;
 getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException;
 getCounters(GetCountersRequest request) throws IOException;
 getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) 
 throws IOException;
 getTaskReports(GetTaskReportsRequest request) throws IOException;
 getDiagnostics(GetDiagnosticsRequest request) throws IOException;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.

2014-12-01 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6160:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, Rohith!  I committed this to trunk and branch-2.


 Potential NullPointerException in MRClientProtocol interface implementation.
 

 Key: MAPREDUCE-6160
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, 
 MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch


 In the implementation of MRClientProtocol, many methods can throw 
 NullPointerExceptions. Instead of NullPointerExceptions, better to throw 
 IOException with proper message.
 In the HistoryClientService class and MRClientService class has 
 #verifyAndGetJob() method that return job object as null.
 {code}
 getTaskReport(GetTaskReportRequest request) throws IOException;
 getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException;
 getCounters(GetCountersRequest request) throws IOException;
 getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) 
 throws IOException;
 getTaskReports(GetTaskReportsRequest request) throws IOException;
 getDiagnostics(GetDiagnosticsRequest request) throws IOException;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.

2014-12-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230627#comment-14230627
 ] 

Hudson commented on MAPREDUCE-6160:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #6620 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6620/])
MAPREDUCE-6160. Potential NullPointerException in MRClientProtocol interface 
implementation. Contributed by Rohith (jlowe: rev 
0c588904f8b68cad219d2bd8e33081d5cae656e5)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryServer.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/client/MRClientService.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryClientService.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRClientService.java


 Potential NullPointerException in MRClientProtocol interface implementation.
 

 Key: MAPREDUCE-6160
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, 
 MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch


 In the implementation of MRClientProtocol, many methods can throw 
 NullPointerExceptions. Instead of NullPointerExceptions, better to throw 
 IOException with proper message.
 In the HistoryClientService class and MRClientService class has 
 #verifyAndGetJob() method that return job object as null.
 {code}
 getTaskReport(GetTaskReportRequest request) throws IOException;
 getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException;
 getCounters(GetCountersRequest request) throws IOException;
 getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) 
 throws IOException;
 getTaskReports(GetTaskReportsRequest request) throws IOException;
 getDiagnostics(GetDiagnosticsRequest request) throws IOException;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-6171) The visibilities of the distributed cache files and archives should be determined by both their permissions and if they are located in HDFS encryption zone

2014-12-01 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang resolved MAPREDUCE-6171.

   Resolution: Duplicate
Fix Version/s: 2.7.0

Duping this to HADOOP-11341 since Dian reports that it fixes this issue. Thanks 
again Dian/Arun for finding and working on this.

 The visibilities of the distributed cache files and archives should be 
 determined by both their permissions and if they are located in HDFS 
 encryption zone
 ---

 Key: MAPREDUCE-6171
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6171
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Reporter: Dian Fu
 Fix For: 2.7.0


 The visibilities of the distributed cache files and archives are currently 
 determined by the permission of these files or archives. 
 The following is the logic of method isPublic() in class 
 ClientDistributedCacheManager:
 {code}
 static boolean isPublic(Configuration conf, URI uri,
   MapURI, FileStatus statCache) throws IOException {
 FileSystem fs = FileSystem.get(uri, conf);
 Path current = new Path(uri.getPath());
 //the leaf level file should be readable by others
 if (!checkPermissionOfOther(fs, current, FsAction.READ, statCache)) {
   return false;
 }
 return ancestorsHaveExecutePermissions(fs, current.getParent(), 
 statCache);
   }
 {code}
 At NodeManager side, it will use yarn user to download public files and use 
 the user who submits the job to download private files. In normal cases, 
 there is no problem with this. However, if the files are located in an 
 encryption zone(HDFS-6134) and yarn user are configured to be disallowed to 
 fetch the DataEncryptionKey(DEK) of this encryption zone by KMS, the download 
 process of this file will fail. 
 You can reproduce this issue with the following steps (assume you submit job 
 with user testUser): 
 # create a clean cluster which has HDFS cryptographic FileSystem feature
 # create directory /data/ in HDFS and make it as an encryption zone with 
 keyName testKey
 # configure KMS to only allow user testUser can decrypt DEK of key 
 testKey in KMS
 {code}
   property
 namekey.acl.testKey.DECRYPT_EEK/name
 valuetestUser/value
   /property
 {code}
 # execute job teragen with user testUser:
 {code}
 su -s /bin/bash testUser -c hadoop jar hadoop-mapreduce-examples*.jar 
 teragen 1 /data/terasort-input 
 {code}
 # execute job terasort with user testUser:
 {code}
 su -s /bin/bash testUser -c hadoop jar hadoop-mapreduce-examples*.jar 
 terasort /data/terasort-input /data/terasort-output
 {code}
 You will see logs like this at the job submitter's console:
 {code}
 INFO mapreduce.Job: Job job_1416860917658_0002 failed with state FAILED due 
 to: Application application_1416860917658_0002 failed 2 times due to AM 
 Container for appattempt_1416860917658_0002_02 exited with  exitCode: 
 -1000 due to: org.apache.hadoop.security.authorize.AuthorizationException: 
 User [yarn] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
 [testKey]!!
 {code}
 The initial idea to solve this issue is to modify the logic in 
 ClientDistributedCacheManager.isPublic to consider also whether this file is 
 in an encryption zone. If it is in an encryption zone, this file should be 
 considered as private. Then at NodeManager side, it will use user who submits 
 the job to fetch the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)