date:20150505

[jira] [Commented] (MAPREDUCE-6259) IllegalArgumentException due to missing job submit time

2015-05-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528323#comment-14528323
 ] 

Hudson commented on MAPREDUCE-6259:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/918/])
MAPREDUCE-6259. IllegalArgumentException due to missing job submit time. 
Contributed by zhihai xu (jlowe: rev bf70c5ae2824a9139c1aa9d7c14020018881cec2)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/AMStartedEvent.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java


 IllegalArgumentException due to missing job submit time
 ---

 Key: MAPREDUCE-6259
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6259
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.1

 Attachments: MAPREDUCE-6259.000.patch


 -1 job submit time cause IllegalArgumentException when parse the Job history 
 file name and JOB_INIT_FAILED cause -1 job submit time in JobIndexInfo.
 We found the following job history file name which cause 
 IllegalArgumentException when parse the job status in the job history file 
 name.
 {code}
 job_1418398645407_115853--1-worun-kafka%2Dto%2Dhdfs%5Btwo%5D%5B15+topic%28s%29%5D-1423572836007-0-0-FAILED-root.journaling-1423572836007.jhist
 {code}
 The stack trace for the IllegalArgumentException is
 {code}
 2015-02-10 04:54:01,863 WARN org.apache.hadoop.mapreduce.v2.hs.PartialJob: 
 Exception while parsing job state. Defaulting to KILLED
 java.lang.IllegalArgumentException: No enum constant 
 org.apache.hadoop.mapreduce.v2.api.records.JobState.0
   at java.lang.Enum.valueOf(Enum.java:236)
   at 
 org.apache.hadoop.mapreduce.v2.api.records.JobState.valueOf(JobState.java:21)
   at 
 org.apache.hadoop.mapreduce.v2.hs.PartialJob.getState(PartialJob.java:82)
   at 
 org.apache.hadoop.mapreduce.v2.hs.PartialJob.init(PartialJob.java:59)
   at 
 org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:159)
   at 
 org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getPartialJobs(CachedHistoryStorage.java:173)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.getPartialJobs(JobHistory.java:284)
   at 
 org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.getJobs(HsWebServices.java:212)
   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
   at 
 com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
   at 
 com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
   at 
 com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
   at 
 com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
   at 
 com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
   at 
 com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
   at 
 com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
   at 
 com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
   at 
 com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
   at 
 com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
   at 
 com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
   at 
 com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
   at

[jira] [Commented] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge

2015-05-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528329#comment-14528329
 ] 

Hudson commented on MAPREDUCE-5649:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/918/])
MAPREDUCE-5649. Reduce cannot use more than 2G memory for the final merge. 
Contributed by Gera Shegalov (jlowe: rev 
7dc3c1203d1ab14c09d0aaf0869a5bcdfafb0a5a)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java


 Reduce cannot use more than 2G memory  for the final merge
 --

 Key: MAPREDUCE-5649
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: stanley shi
Assignee: Gera Shegalov
 Fix For: 2.8.0

 Attachments: MAPREDUCE-5649.001.patch, MAPREDUCE-5649.002.patch, 
 MAPREDUCE-5649.003.patch


 In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in 
 the finalMerge method: 
  int maxInMemReduce = (int)Math.min(
 Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE);
  
 This means no matter how much memory user has, reducer will not retain more 
 than 2G data in memory before the reduce phase starts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8

2015-05-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528321#comment-14528321
 ] 

Hudson commented on MAPREDUCE-6165:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/918/])
MAPREDUCE-6165. [JDK8] TestCombineFileInputFormat failed on JDK8. Contributed 
by Akira AJISAKA. (ozawa: rev 551615fa13f65ae996bae9c1bacff189539b6557)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
* hadoop-mapreduce-project/CHANGES.txt


 [JDK8] TestCombineFileInputFormat failed on JDK8
 

 Key: MAPREDUCE-6165
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Akira AJISAKA
Priority: Minor
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, 
 MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-004.patch, 
 MAPREDUCE-6165-reproduce.patch


 The error msg:
 {noformat}
 testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 2.487 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911)
 testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 0.985 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6259) IllegalArgumentException due to missing job submit time

2015-05-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528289#comment-14528289
 ] 

Hudson commented on MAPREDUCE-6259:
---

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #184 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/184/])
MAPREDUCE-6259. IllegalArgumentException due to missing job submit time. 
Contributed by zhihai xu (jlowe: rev bf70c5ae2824a9139c1aa9d7c14020018881cec2)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/AMStartedEvent.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* hadoop-mapreduce-project/CHANGES.txt


 IllegalArgumentException due to missing job submit time
 ---

 Key: MAPREDUCE-6259
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6259
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.1

 Attachments: MAPREDUCE-6259.000.patch


 -1 job submit time cause IllegalArgumentException when parse the Job history 
 file name and JOB_INIT_FAILED cause -1 job submit time in JobIndexInfo.
 We found the following job history file name which cause 
 IllegalArgumentException when parse the job status in the job history file 
 name.
 {code}
 job_1418398645407_115853--1-worun-kafka%2Dto%2Dhdfs%5Btwo%5D%5B15+topic%28s%29%5D-1423572836007-0-0-FAILED-root.journaling-1423572836007.jhist
 {code}
 The stack trace for the IllegalArgumentException is
 {code}
 2015-02-10 04:54:01,863 WARN org.apache.hadoop.mapreduce.v2.hs.PartialJob: 
 Exception while parsing job state. Defaulting to KILLED
 java.lang.IllegalArgumentException: No enum constant 
 org.apache.hadoop.mapreduce.v2.api.records.JobState.0
   at java.lang.Enum.valueOf(Enum.java:236)
   at 
 org.apache.hadoop.mapreduce.v2.api.records.JobState.valueOf(JobState.java:21)
   at 
 org.apache.hadoop.mapreduce.v2.hs.PartialJob.getState(PartialJob.java:82)
   at 
 org.apache.hadoop.mapreduce.v2.hs.PartialJob.init(PartialJob.java:59)
   at 
 org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:159)
   at 
 org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getPartialJobs(CachedHistoryStorage.java:173)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.getPartialJobs(JobHistory.java:284)
   at 
 org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.getJobs(HsWebServices.java:212)
   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
   at 
 com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
   at 
 com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
   at 
 com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
   at 
 com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
   at 
 com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
   at 
 com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
   at 
 com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
   at 
 com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
   at 
 com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
   at 
 com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
   at 
 com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
   at 
 com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
   at

[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8

2015-05-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528287#comment-14528287
 ] 

Hudson commented on MAPREDUCE-6165:
---

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #184 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/184/])
MAPREDUCE-6165. [JDK8] TestCombineFileInputFormat failed on JDK8. Contributed 
by Akira AJISAKA. (ozawa: rev 551615fa13f65ae996bae9c1bacff189539b6557)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java


 [JDK8] TestCombineFileInputFormat failed on JDK8
 

 Key: MAPREDUCE-6165
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Akira AJISAKA
Priority: Minor
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, 
 MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-004.patch, 
 MAPREDUCE-6165-reproduce.patch


 The error msg:
 {noformat}
 testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 2.487 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911)
 testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 0.985 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6356) Misspelling of threshold in log4j.properties for tests

2015-05-05 Thread Brahma Reddy Battula (JIRA)

Brahma Reddy Battula created MAPREDUCE-6356:
---

 Summary: Misspelling of threshold in log4j.properties for tests
 Key: MAPREDUCE-6356
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6356
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor


log4j.properties file for test contains misspelling log4j.threshhold.
We should use log4j.threshold correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4070) JobHistoryServer creates /tmp directory with restrictive permissions if the directory doesn't already exist.

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4070:

Labels: BB2015-05-TBR  (was: )

 JobHistoryServer creates /tmp directory with restrictive permissions if the 
 directory doesn't already exist.
 

 Key: MAPREDUCE-4070
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4070
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4070.patch


 Starting up the MapReduce JobhHistoryServer service after a clean install 
 appears to automatically create the /tmp directory on HDFS. However, it is 
 created with 750 permission.
 Attempting to run MR jobs by other users results in the following permissions 
 exception:
 {code}
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=cloudera, access=EXECUTE, inode=/tmp:yarn:supergroup:drwxr-x---
 at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
 ..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-1125) SerialUtils.cc: deserializeFloat is out of sync with SerialUtils.hh

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-1125:

Labels: BB2015-05-TBR  (was: )

 SerialUtils.cc: deserializeFloat is out of sync with SerialUtils.hh
 ---

 Key: MAPREDUCE-1125
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1125
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: pipes
Affects Versions: 0.21.0
Reporter: Simone Leo
Assignee: Simone Leo
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-1125-2.patch, MAPREDUCE-1125-3.patch


 {noformat}
 *** SerialUtils.hh ***
   float deserializeFloat(InStream stream);
 *** SerialUtils.cc ***
   void deserializeFloat(float t, InStream stream)
   {
 char buf[sizeof(float)];
 stream.read(buf, sizeof(float));
 XDR xdrs;
 xdrmem_create(xdrs, buf, sizeof(float), XDR_DECODE);
 xdr_float(xdrs, t);
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-2638) Create a simple stress test for the fair scheduler

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-2638:

Labels: BB2015-05-TBR  (was: )

 Create a simple stress test for the fair scheduler
 --

 Key: MAPREDUCE-2638
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2638
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: contrib/fair-share
Reporter: Tom White
Assignee: Tom White
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-2638.patch, MAPREDUCE-2638.patch


 This would be a test that runs against a cluster, typically with settings 
 that allow preemption to be exercised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5801) Uber mode's log message is missing a vcore reason

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5801:

Labels: BB2015-05-TBR easyfix  (was: easyfix)

 Uber mode's log message is missing a vcore reason
 -

 Key: MAPREDUCE-5801
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5801
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Steven Wong
Assignee: Steven Wong
Priority: Minor
  Labels: BB2015-05-TBR, easyfix
 Attachments: MAPREDUCE-5801.patch


 If a job cannot be run in uber mode because of insufficient vcores, the 
 resulting log message has an empty reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming

2015-05-05 Thread Allen Wittenauer (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer updated MAPREDUCE-5018:

Labels: BB2015-05-TBR (was: )

Support raw binary data with Hadoop streaming
-

Key: MAPREDUCE-5018
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: contrib/streaming
Affects Versions: 1.1.2
Reporter: Jay Hacker
Assignee: Steven Willis
Priority: Minor
Labels: BB2015-05-TBR
Attachments: MAPREDUCE-5018-branch-1.1.patch, MAPREDUCE-5018.patch,
MAPREDUCE-5018.patch, justbytes.jar, mapstream

People often have a need to run older programs over many files, and turn to
Hadoop streaming as a reliable, performant batch system. There are good
reasons for this:
1. Hadoop is convenient: they may already be using it for mapreduce jobs, and
it is easy to spin up a cluster in the cloud.
2. It is reliable: HDFS replicates data and the scheduler retries failed jobs.
3. It is reasonably performant: it moves the code to the data, maintaining
locality, and scales with the number of nodes.
Historically Hadoop is of course oriented toward processing key/value pairs,
and so needs to interpret the data passing through it. Unfortunately, this
makes it difficult to use Hadoop streaming with programs that don't deal in
key/value pairs, or with binary data in general. For example, something as
simple as running md5sum to verify the integrity of files will not give the
correct result, due to Hadoop's interpretation of the data.
There have been several attempts at binary serialization schemes for Hadoop
streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed
at efficiently encoding key/value pairs, and not passing data through
unmodified. Even the RawBytes serialization scheme adds length fields to
the data, rendering it not-so-raw.
I often have a need to run a Unix filter on files stored in HDFS; currently,
the only way I can do this on the raw data is to copy the data out and run
the filter on one machine, which is inconvenient, slow, and unreliable. It
would be very convenient to run the filter as a map-only job, allowing me to
build on existing (well-tested!) building blocks in the Unix tradition
instead of reimplementing them as mapreduce programs.
However, most existing tools don't know about file splits, and so want to
process whole files; and of course many expect raw binary input and output.
The solution is to run a map-only job with an InputFormat and OutputFormat
that just pass raw bytes and don't split. It turns out to be a little more
complicated with streaming; I have attached a patch with the simplest
solution I could come up with. I call the format JustBytes (as RawBytes
was already taken), and it should be usable with most recent versions of
Hadoop.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4071) NPE while executing MRAppMaster shutdown hook

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4071:

Labels: BB2015-05-TBR  (was: )

 NPE while executing MRAppMaster shutdown hook
 -

 Key: MAPREDUCE-4071
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4071
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Bhallamudi Venkata Siva Kamesh
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4071-1.patch, MAPREDUCE-4071-2.patch, 
 MAPREDUCE-4071-2.patch, MAPREDUCE-4071.patch


 While running the shutdown hook of MRAppMaster, hit NPE
 {noformat}
 Exception in thread Thread-1 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:668)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1004)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6232) Task state is running when all task attempts fail

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6232:

Labels: BB2015-05-TBR  (was: )

 Task state is running when all task attempts fail
 -

 Key: MAPREDUCE-6232
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6232
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.6.0
Reporter: Yang Hao
Assignee: Yang Hao
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6232.patch, MAPREDUCE-6232.v2.patch, 
 TaskImpl.new.png, TaskImpl.normal.png, result.pdf


 When task attempts fails, the task's state is still  running. A clever way is 
 to check the task attempts's state, if none of the attempts is running, then 
 the task state should not be running



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6258) add support to back up JHS files from application master

2015-05-05 Thread Allen Wittenauer (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer updated MAPREDUCE-6258:

Labels: BB2015-05-TBR (was: )

add support to back up JHS files from application master

Key: MAPREDUCE-6258
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: applicationmaster
Affects Versions: 2.4.1
Reporter: Jian Fang
Labels: BB2015-05-TBR
Attachments: MAPREDUCE-6258.patch

In hadoop two, job history files are stored on HDFS with a default retention
period of one week. In a cloud environment, these HDFS files are actually
stored on the disks of ephemeral instances that could go away once the
instances are terminated. Users may want to back up the job history files for
issue investigation and performance analysis before and after the cluster is
terminated.
A centralized backup mechanism could have a scalability issue for big and
busy Hadoop clusters where there are probably tens of thousands of jobs every
day. As a result, it is preferred to have a distributed way to back up the
job history files in this case. To achieve this goal, we could add a new
feature to back up the job history files in Application master. More
specifically, we could copy the job history files to a backup path when they
are moved from the temporary staging directory to the intermediate_done path
in application master. Since application masters could run on any slave nodes
on a Hadoop cluster, we could achieve a better scalability by backing up the
job history files in a distributed fashion.
Please be aware, the backup path should be managed by the Hadoop users based
on their needs. For example, some Hadoop users may copy the job history files
to a cloud storage directly and keep them there forever. While some other
users may want to store the job history files on local disks and clean them
up from time to time.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6208) There should be an input format for MapFiles which can be configured so that only a fraction of the input data is used for the MR process

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6208:

Labels: BB2015-05-TBR inputformat mapfile  (was: inputformat mapfile)

 There should be an input format for MapFiles which can be configured so that 
 only a fraction of the input data is used for the MR process
 -

 Key: MAPREDUCE-6208
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6208
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Jens Rabe
Assignee: Jens Rabe
  Labels: BB2015-05-TBR, inputformat, mapfile
 Attachments: MAPREDUCE-6208.001.patch, MAPREDUCE-6208.002.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In some cases there are large amounts of data organized in MapFiles, e.g., 
 from previous MapReduce tasks, and only a fraction of the data is to be 
 processed in a MR task. The current approach, as I understand, is to 
 re-organize the data in a suitable partition using folders on HDFS, and only 
 use relevant folders as input paths, and maybe doing some additional 
 filtering in the Map task. However, sometimes the input data cannot be easily 
 partitioned that way. For example, when processing large amounts of measured 
 data where additional data on a time period already in HDFS arrives later.
 There should be an input format that accepts folders with MapFiles, and there 
 should be an option to specify the input key range so that only fitting 
 InputSplits are generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4961:

Labels: BB2015-05-TBR  (was: )

 Map reduce running local should also go through ShuffleConsumerPlugin for 
 enabling different MergeManager implementations
 -

 Key: MAPREDUCE-4961
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jerry Chen
Assignee: Jerry Chen
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4961.patch, MAPREDUCE-4961.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
 extends Shuffle to be able to provide different MergeManager implementations. 
 While using these pluggable features, I find that when a map reduce is 
 running locally, a RawKeyValueIterator was returned directly from a static 
 call of Merge.merge, which break the assumption that the Shuffle may provide 
 different merge methods although there is no copy phase for this situation.
 The use case is when I am implementating a hash-based MergeManager, we don't 
 need sort in map side, while when running the map reduce locally, the 
 hash-based MergeManager will have no chance to be used as it goes directly to 
 Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
 So we need to move the code calling Merger.merge from Reduce Task to 
 ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
 decide how to do the merge and return corresponding iterator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6205) Update the value of the new version properties of the deprecated property mapred.child.java.opts

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6205:

Labels: BB2015-05-TBR  (was: )

 Update the value of the new version properties of the deprecated property 
 mapred.child.java.opts
 --

 Key: MAPREDUCE-6205
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6205
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: sam liu
Assignee: sam liu
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6205.003.patch, MAPREDUCE-6205.patch, 
 MAPREDUCE-6205.patch


 In current hadoop code, the old property mapred.child.java.opts is 
 deprecated and its new versions are MRJobConfig.MAP_JAVA_OPTS and 
 MRJobConfig.REDUCE_JAVA_OPTS. However, when user set a value to the 
 deprecated property mapred.child.java.opts, hadoop won't automatically 
 update its new versions properties 
 MRJobConfig.MAP_JAVA_OPTS(mapreduce.map.java.opts) and 
 MRJobConfig.REDUCE_JAVA_OPTS(mapreduce.reduce.java.opts). As hadoop will 
 update the new version properties for many other deprecated properties, we 
 also should support such feature on the old property 
 mapred.child.java.opts, otherwise it might bring some imcompatible issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5915) Pipes ping thread should sleep in intervals to allow for isDone() to be checked

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5915:

Labels: BB2015-05-TBR  (was: )

 Pipes ping thread should sleep in intervals to allow for isDone() to be 
 checked
 ---

 Key: MAPREDUCE-5915
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5915
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: pipes
Reporter: Joe Mudd
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5915.patch


 The ping() thread sleeps for 5 seconds at a time causing up to a 5 second 
 delay in testing if the job is finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6155) MapFiles are not always correctly detected by SequenceFileInputFormat

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6155:

Labels: BB2015-05-TBR  (was: )

 MapFiles are not always correctly detected by SequenceFileInputFormat
 -

 Key: MAPREDUCE-6155
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6155
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jens Rabe
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6155.001.patch, MAPREDUCE-6155.002.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 MapFiles are not correctly detected by SequenceFileInputFormat.
 This is because the listStatus method only detects a MapFile correctly if the 
 path it checks is a directory - it then replaces it by the path of the data 
 file.
 This is likely to fail if the data file does not exist, i.e., if the input 
 path is a directory, but does not belong to a MapFile, or if recursion is 
 turned on and the input format comes across a file (not a directory) which is 
 indeed part of a MapFile.
 The listStatus method should be changed to detect these cases correctly:
 * if the current candidate is a file and its name is index or data, check 
 if its corresponding other file exists, and if the key types of both files 
 match and if the value type of the index file is LongWritable
 * If the current candidate is a directory, it is only a MapFile if (and only 
 if) an index and a data file exist, they are both SequenceFiles and their key 
 types match (and the index value type is LongWritable)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-3383) Duplicate job.getOutputValueGroupingComparator() in ReduceTask

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3383:

Labels: BB2015-05-TBR  (was: )

 Duplicate job.getOutputValueGroupingComparator() in ReduceTask
 --

 Key: MAPREDUCE-3383
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3383
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.1
Reporter: Binglin Chang
Assignee: Binglin Chang
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-3383.patch


 This is probably just a small error by mistake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4710) Add peak memory usage counter for each task

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4710:

Labels: BB2015-05-TBR patch  (was: patch)

 Add peak memory usage counter for each task
 ---

 Key: MAPREDUCE-4710
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4710
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task
Affects Versions: 1.0.2
Reporter: Cindy Li
Assignee: Cindy Li
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: MAPREDUCE-4710-trunk.patch, mapreduce-4710-v1.0.2.patch, 
 mapreduce-4710.patch, mapreduce4710-v3.patch, mapreduce4710-v6.patch, 
 mapreduce4710.patch


 Each task has counters PHYSICAL_MEMORY_BYTES and VIRTUAL_MEMORY_BYTES, which 
 are snapshots of memory usage of that task. They are not sufficient for users 
 to understand peak memory usage by that task, e.g. in order to diagnose task 
 failures, tune job parameters or change application design. This new feature 
 will add two more counters for each task: PHYSICAL_MEMORY_BYTES_MAX and 
 VIRTUAL_MEMORY_BYTES_MAX. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-3384) Add warning message for org.apache.hadoop.mapreduce.lib.reduce.LongSumReducer

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3384:

Labels: BB2015-05-TBR  (was: )

 Add warning message for org.apache.hadoop.mapreduce.lib.reduce.LongSumReducer
 -

 Key: MAPREDUCE-3384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3384
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: JiangKai
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-3384.patch


 When we call the function reduce() of LongSumReducer,the result may overflow.
 We should send a warning message to users if overflow occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setNodeLevelAggregation(boolean)) to JobConf

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4911:

Labels: BB2015-05-TBR  (was: )

 Add node-level aggregation flag feature(setNodeLevelAggregation(boolean)) to 
 JobConf
 

 Key: MAPREDUCE-4911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: client
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4911.2.patch, MAPREDUCE-4911.3.patch, 
 MAPREDUCE-4911.patch


 This JIRA adds node-level aggregation flag 
 feature(setLocalAggregation(boolean)) to JobConf.
 This task is subtask of MAPREDUCE-4502.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5728) Check NPE for serializer/deserializer in MapTask

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5728:

Labels: BB2015-05-TBR  (was: )

 Check NPE for serializer/deserializer in MapTask
 

 Key: MAPREDUCE-5728
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5728
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.2.0
Reporter: Jerry He
Assignee: Jerry He
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5728-trunk.patch


 Currently we will get NPE if the serializer/deserializer is not configured 
 correctly.
 {code}
 14/01/14 11:52:35 INFO mapred.JobClient: Task Id : 
 attempt_201401072154_0027_m_02_2, Status : FAILED
 java.lang.NullPointerException
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:944)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:672)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:740)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:368)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at 
 java.security.AccessController.doPrivileged(AccessController.java:362)
 at javax.security.auth.Subject.doAs(Subject.java:573)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 serializationFactory.getSerializer and serializationFactory.getDeserializer 
 returns NULL in this case.
 Let's check NPE for serializer/deserializer in MapTask so that we don't get 
 meaningless NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5269) Preemption of Reducer (and Shuffle) via checkpointing

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5269:

Labels: BB2015-05-TBR  (was: )

 Preemption of Reducer (and Shuffle) via checkpointing
 -

 Key: MAPREDUCE-5269
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5269
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5269.2.patch, MAPREDUCE-5269.3.patch, 
 MAPREDUCE-5269.4.patch, MAPREDUCE-5269.5.patch, MAPREDUCE-5269.6.patch, 
 MAPREDUCE-5269.7.patch, MAPREDUCE-5269.patch


 This patch tracks the changes in the task runtime (shuffle, reducer context, 
 etc.) that are required to implement checkpoint-based preemption of reducer 
 tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5916) The authenticate response is not sent when password is empty (LocalJobRunner)

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5916:

Labels: BB2015-05-TBR  (was: )

 The authenticate response is not sent when password is empty (LocalJobRunner)
 -

 Key: MAPREDUCE-5916
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5916
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: pipes
Reporter: Joe Mudd
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5916.patch


 When running in a mode where there are no credentials associated with the 
 pipes submission and the password is empty, the C++ verifyDigestAndRespond() 
 does not respond to the Java side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-3385) Add warning message for the overflow in reduce() of org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3385:

Labels: BB2015-05-TBR  (was: )

 Add warning message for the overflow in reduce() of 
 org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer
 

 Key: MAPREDUCE-3385
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3385
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: JiangKai
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-3385.patch


 When we call the function reduce() of IntSumReducer,the result may overflow.
 We should send a warning message to users if overflow occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-3047) FileOutputCommitter throws wrong type of exception when calling abortTask() to handle a directory without permission

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3047:

Labels: BB2015-05-TBR  (was: )

 FileOutputCommitter throws wrong type of exception when calling abortTask() 
 to handle a directory without permission
 

 Key: MAPREDUCE-3047
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3047
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: JiangKai
Priority: Trivial
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-3047-1.patch, MAPREDUCE-3047-2.patch, 
 MAPREDUCE-3047.patch


 When FileOutputCommitter calls abortTask() to create a temp directory, if the 
 user has no permission to access the directory, or a file with the same name 
 has existed, of course it will fail, however the system will output the error 
 information into the log file instead of throwing an exception.As a result, 
 when the temp directory is needed later, since the temp directory hasn't been 
 created yet, system will throw an exception to tell user that the temp 
 directory doesn't exist.In my opinion, the exception is not exact and the 
 error infomation will confuse users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5403:

Labels: BB2015-05-TBR  (was: )

 MR changes to accommodate yarn.application.classpath being moved to the 
 server-side
 ---

 Key: MAPREDUCE-5403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.0.5-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403-2.patch, 
 MAPREDUCE-5403.patch


 yarn.application.classpath is a confusing property because it is used by 
 MapReduce and not YARN, and MapReduce already has 
 mapreduce.application.classpath, which provides the same functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-3807) JobTracker needs fix similar to HDFS-94

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3807:

Labels: BB2015-05-TBR newbie  (was: newbie)

 JobTracker needs fix similar to HDFS-94
 ---

 Key: MAPREDUCE-3807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Harsh J
  Labels: BB2015-05-TBR, newbie
 Attachments: MAPREDUCE-3807.patch


 1.0 JobTracker's jobtracker.jsp page currently shows:
 {code}
 h2Cluster Summary (Heap Size is %= 
 StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %/%= 
 StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %)/h2
 {code}
 It could use an improvement same as HDFS-94 to reflect live heap usage more 
 accurately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5188) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5188:

Labels: BB2015-05-TBR contrib/raid  (was: contrib/raid)

 error when verify FileType of RS_SOURCE in getCompanionBlocks  in 
 BlockPlacementPolicyRaid.java
 ---

 Key: MAPREDUCE-5188
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5188
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 2.0.2-alpha
Reporter: junjin
Assignee: junjin
Priority: Critical
  Labels: BB2015-05-TBR, contrib/raid
 Fix For: 2.0.2-alpha

 Attachments: MAPREDUCE-5188.patch


 error when verify FileType of RS_SOURCE in getCompanionBlocks  in 
 BlockPlacementPolicyRaid.java
 need change xorParityLength in line #379 to rsParityLength since it's for 
 verifying RS_SOURCE  type



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5365) Set mapreduce.job.classloader to true by default

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5365:

Labels: BB2015-05-TBR  (was: )

 Set mapreduce.job.classloader to true by default
 

 Key: MAPREDUCE-5365
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5365
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5365.patch


 MAPREDUCE-1700 introduced the mapreduce.job.classpath option, which uses a 
 custom classloader to separate system classes from user classes.  It seems 
 like there are only rare cases when a user would not want this on, and that 
 it should enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4346) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4346:

Labels: BB2015-05-TBR  (was: )

 Adding a refined version of JobTracker.getAllJobs() and exposing through the 
 JobClient
 --

 Key: MAPREDUCE-4346
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, 
 MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch


 The current implementation for JobTracker.getAllJobs() returns all submitted 
 jobs in any state, in addition to retired jobs. This list can be long and 
 represents an unneeded overhead especially in the case of clients only 
 interested in jobs in specific state(s). 
 It is beneficial to include a refined version where only jobs having specific 
 statuses are returned and retired jobs are optional to include. 
 I'll be uploading an initial patch momentarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4330) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4330:

Labels: BB2015-05-TBR  (was: )

 TaskAttemptCompletedEventTransition invalidates previously successful attempt 
 without checking if the newly completed attempt is successful
 ---

 Key: MAPREDUCE-4330
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4330
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.1
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4330-20130415.1.patch, 
 MAPREDUCE-4330-20130415.patch, MAPREDUCE-4330-21032013.1.patch, 
 MAPREDUCE-4330-21032013.patch


 The previously completed attempt is removed from 
 successAttemptCompletionEventNoMap and marked OBSOLETE.
 After that, if the newly completed attempt is successful then it is added to 
 the successAttemptCompletionEventNoMap. 
 This seems wrong because the newly completed attempt could be failed and thus 
 there is no need to invalidate the successful attempt.
 One error case would be when a speculative attempt completes with 
 killed/failed after the successful version has completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4273) Make CombineFileInputFormat split result JDK independent

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4273:

Labels: BB2015-05-TBR  (was: )

 Make CombineFileInputFormat split result JDK independent
 

 Key: MAPREDUCE-4273
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4273
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 1.0.3
Reporter: Luke Lu
Assignee: Yu Gao
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4273-branch1-v2.patch, 
 mapreduce-4273-branch-1.patch, mapreduce-4273-branch-2.patch, 
 mapreduce-4273.patch


 The split result of CombineFileInputFormat depends on the iteration order of  
 nodeToBlocks and rackToBlocks hash maps, which makes the result HashMap 
 implementation hence JDK dependent.
 This is manifested as TestCombineFileInputFormat failures on alternative JDKs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5377) JobID is not displayed truly by hadoop job -history command

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5377:

Labels: BB2015-05-TBR newbie  (was: newbie)

 JobID is not displayed truly by hadoop job -history command
 -

 Key: MAPREDUCE-5377
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5377
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
Priority: Minor
  Labels: BB2015-05-TBR, newbie
 Attachments: MAPREDUCE-5377.patch


 JobID output by hadoop job -history command is wrong string.
 {quote}
 [hadoop@hadoop hadoop]$ hadoop job -history terasort
 Hadoop job: 0001_1374260789919_hadoop
 =
 Job tracker host name: job
 job tracker start time: Tue May 18 15:39:51 PDT 1976
 User: hadoop
 JobName: TeraSort
 JobConf: 
 hdfs://hadoop:8020/hadoop/mapred/staging/hadoop/.staging/job_201307191206_0001/job.xml
 Submitted At: 19-7-2013 12:06:29
 Launched At: 19-7-2013 12:06:30 (0sec)
 Finished At: 19-7-2013 12:06:44 (14sec)
 Status: SUCCESS
 {quote}
 In this example, it should show job_201307191206_0001 at Hadoop job:, but 
 shows 0001_1374260789919_hadoop. In addition, Job tracker host name and 
 job tracker start time is invalid.
 This problem can solve by fixing setting of jobId in HistoryViewer(). In 
 addition, it should fix the information of JobTracker at HistoryViewr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5150) Backport 2009 terasort (MAPREDUCE-639) to branch-1

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5150:

Labels: BB2015-05-TBR  (was: )

 Backport 2009 terasort (MAPREDUCE-639) to branch-1
 --

 Key: MAPREDUCE-5150
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5150
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: examples
Affects Versions: 1.2.0
Reporter: Gera Shegalov
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5150-branch-1.patch


 Users evaluate performance of Hadoop clusters using different benchmarks such 
 as TeraSort. However, terasort version in branch-1 is outdated. It works on 
 teragen dataset that cannot exceed 4 billion unique keys and it does not have 
 the fast non-sampling partitioner SimplePartitioner either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3936:

Labels: BB2015-05-TBR  (was: )

 Clients should not enforce counter limits 
 --

 Key: MAPREDUCE-3936
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1
Reporter: Tom White
Assignee: Tom White
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch


 The code for enforcing counter limits (from MAPREDUCE-1943) creates a static 
 JobConf instance to load the limits, which may throw an exception if the 
 client limit is set to be lower than the limit on the cluster (perhaps 
 because the cluster limit was raised from the default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6251:

Labels: BB2015-05-TBR  (was: )

 JobClient needs additional retries at a higher level to address 
 not-immediately-consistent dfs corner cases
 ---

 Key: MAPREDUCE-6251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.6.0
Reporter: Craig Welch
Assignee: Craig Welch
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
 MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch


 The JobClient is used to get job status information for running and completed 
 jobs.  Final state and history for a job is communicated from the application 
 master to the job history server via a distributed file system - where the 
 history is uploaded by the application master to the dfs and then 
 scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
 guarantees not all Hadoop DFS's do.  When used in conjunction with a 
 distributed file system which does not have this guarantee there will be 
 cases where the history server may not see an uploaded file, resulting in the 
 dreaded no such job and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5819) Binary token merge should be done once in TokenCache#obtainTokensForNamenodesInternal()

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5819:

Labels: BB2015-05-TBR  (was: )

 Binary token merge should be done once in 
 TokenCache#obtainTokensForNamenodesInternal()
 ---

 Key: MAPREDUCE-5819
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5819
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: mapreduce-5819-v1.txt


 Currently mergeBinaryTokens() is called by every invocation of 
 obtainTokensForNamenodesInternal(FileSystem, Credentials, Configuration) in 
 the loop of obtainTokensForNamenodesInternal(Credentials, Path[], 
 Configuration).
 This can be simplified so that mergeBinaryTokens() is called only once in 
 obtainTokensForNamenodesInternal(Credentials, Path[], Configuration).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-2340) optimize JobInProgress.initTasks()

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-2340:

Labels: BB2015-05-TBR critical-0.22.0  (was: critical-0.22.0)

 optimize JobInProgress.initTasks()
 --

 Key: MAPREDUCE-2340
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2340
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.1, 0.21.0
Reporter: Kang Xiao
  Labels: BB2015-05-TBR, critical-0.22.0
 Attachments: MAPREDUCE-2340.patch, MAPREDUCE-2340.patch, 
 MAPREDUCE-2340.r1.diff


 JobTracker's hostnameToNodeMap cache can speed up JobInProgress.initTasks() 
 and JobInProgress.createCache() significantly. A test for 1 job with 10 
 maps on a 2400 cluster shows nearly 10 and 50 times speed up for initTasks() 
 and createCache(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5258) Memory Leak while using LocalJobRunner

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5258:

Labels: BB2015-05-TBR patch  (was: patch)

 Memory Leak while using LocalJobRunner
 --

 Key: MAPREDUCE-5258
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5258
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.2
Reporter: Subroto Sanyal
Assignee: skrho
  Labels: BB2015-05-TBR, patch
 Fix For: 1.1.3

 Attachments: mapreduce-5258 _001.txt, mapreduce-5258.txt


 Every-time a LocalJobRunner is launched it creates JobTrackerInstrumentation 
 and QueueMetrics.
 While creating this MetricsSystem ; it registers and adds a Callback to 
 ArrayList which keeps on growing as the DefaultMetricsSystem is Singleton. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6350:

Labels: BB2015-05-TBR  (was: )

 JobHistory doesn't support fully-functional search
 --

 Key: MAPREDUCE-6350
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch


 job history server will only output the first 50 characters of the job names 
 in webUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6284) Add a 'task attempt state' to MapReduce Application Master REST API

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6284:

Labels: BB2015-05-TBR  (was: )

 Add a 'task attempt state' to MapReduce Application Master REST API
 ---

 Key: MAPREDUCE-6284
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6284
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ryu Kobayashi
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6284.1.patch, MAPREDUCE-6284.1.patch, 
 MAPREDUCE-6284.2.patch, MAPREDUCE-6284.3.patch, MAPREDUCE-6284.3.patch


 It want to 'task attempt state' on the 'App state' similarly REST API.
 GET http://proxy http address:port/proxy/application 
 _id/ws/v1/mapreduce/jobs/job_id/tasks/task_id/attempts/attempt_id/state
 PUT http://proxy http address:port/proxy/application 
 _id/ws/v1/mapreduce/jobs/job_id/tasks/task_id/attempts/attempt_id/state
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6338) MR AppMaster does not honor ephemeral port range

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6338:

Labels: BB2015-05-TBR  (was: )

 MR AppMaster does not honor ephemeral port range
 

 Key: MAPREDUCE-6338
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6338
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 2.6.0
Reporter: Frank Nguyen
Assignee: Frank Nguyen
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6338.002.patch


 The MR AppMaster should only use port ranges defined in the 
 yarn.app.mapreduce.am.job.client.port-range property.  On initial startup of 
 the MRAppMaster, it does use the port range defined in the property.  
 However, it also opens up a listener on a random ephemeral port.  This is not 
 the Jetty listener.  It is another listener opened by the MRAppMaster via 
 another thread and is recognized by the RM.  Other nodes will try to 
 communicate to it via that random port.  With firewall settings on, the MR 
 job will fail because the random port is not opened.  This problem has caused 
 others to have all OS ephemeral ports opened to have MR jobs run.
 This is related to MAPREDUCE-4079



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6332) Add more required API's to MergeManager interface

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6332:

Labels: BB2015-05-TBR  (was: )

 Add more required API's to MergeManager interface 
 --

 Key: MAPREDUCE-6332
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6332
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.5.0, 2.6.0, 2.7.0
Reporter: Rohith
Assignee: Rohith
  Labels: BB2015-05-TBR
 Attachments: 0001-MAPREDUCE-6332.patch, 0002-MAPREDUCE-6332.patch


 MR provides ability to the user for plugin custom ShuffleConsumerPlugin using 
 *mapreduce.job.reduce.shuffle.consumer.plugin.class*.  When the user is 
 allowed to use this configuration as plugin, user also interest in 
 implementing his own MergeManagerImpl. 
 But now , user is forced to use MR provided MergeManagerImpl instead of 
 custom MergeManagerImpl when user is using shuffle.consumer.plugin class. 
 There should be well defined API's in MergeManager that can be used for any 
 implementation without much effort to user for custom implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5733:

Labels: BB2015-05-TBR  (was: )

 Define and use a constant for property textinputformat.record.delimiter
 -

 Key: MAPREDUCE-5733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Gelesh
Assignee: Gelesh
Priority: Trivial
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , 
 is bound to typo error. Lets have it as a Static String in some class, to 
 minimise such error. This would also help in IDE like eclipse suggesting the 
 String.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5203) Make AM of M/R Use NMClient

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5203:

Labels: BB2015-05-TBR  (was: )

 Make AM of M/R Use NMClient
 ---

 Key: MAPREDUCE-5203
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5203
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5203.1.patch, MAPREDUCE-5203.2.patch, 
 MAPREDUCE-5203.3.patch, MAPREDUCE-5203.4.patch, MAPREDUCE-5203.5.patch


 YARN-422 adds NMClient. AM of mapreduce should use it instead of using the 
 raw ContainerManager proxy directly. ContainerLauncherImpl needs to be 
 changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-2632:

Labels: BB2015-05-TBR  (was: )

 Avoid calling the partitioner when the numReduceTasks is 1.
 ---

 Key: MAPREDUCE-2632
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.23.0
Reporter: Ravi Teja Ch N V
Assignee: Ravi Teja Ch N V
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-2632-1.patch, MAPREDUCE-2632.patch


 We can avoid the call to the partitioner when the number of reducers is 
 1.This will avoid the unnecessary computations by the partitioner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5374) CombineFileRecordReader does not set map.input.* configuration parameters for first file read

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5374:

Labels: BB2015-05-TBR  (was: )

 CombineFileRecordReader does not set map.input.* configuration parameters 
 for first file read
 ---

 Key: MAPREDUCE-5374
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5374
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Dave Beech
Assignee: Dave Beech
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5374.patch, MAPREDUCE-5374.patch


 The CombineFileRecordReader operates on splits consisting of multiple files. 
 Each time a new record reader is initialised for a chunk, certain 
 parameters are supposed to be set on the configuration object 
 (map.input.file, map.input.start and map.input.length)
 However, the first reader is initialised in a different way to subsequent 
 ones (i.e. initialize is called by the MapTask directly rather than from 
 inside the record reader class). Because of this, these config parameters are 
 not set properly and are returned as null when you access them from inside a 
 mapper. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5981) Log levels of certain MR logs can be changed to DEBUG

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5981:

Labels: BB2015-05-TBR  (was: )

 Log levels of certain MR logs can be changed to DEBUG
 -

 Key: MAPREDUCE-5981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Varun Saxena
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5981.patch


 Following map reduce logs can be changed to DEBUG log level.
 1. In 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher#copyFromHost(Fetcher.java : 
 313), the second log is not required to be at info level. This can be moved 
 to debug as a warn log is anyways printed if verifyReply fails.
   SecureShuffleUtils.verifyReply(replyHash, encHash, shuffleSecretKey);
   LOG.info(for url=+msgToEncode+ sent hash and received reply);
 2. Thread related info need not be printed in logs at INFO level. Below 2 
 logs can be moved to DEBUG
 a) In 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl#getHost(ShuffleSchedulerImpl.java
  : 381), below log can be changed to DEBUG
LOG.info(Assigning  + host +  with  + host.getNumKnownMapOutputs() +
 to  + Thread.currentThread().getName());
 b) In 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.getMapsForHost(ShuffleSchedulerImpl.java
  : 411), below log can be changed to DEBUG
  LOG.info(assigned  + includedMaps +  of  + totalSize +  to  +
  host +  to  + Thread.currentThread().getName());
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5362) clean up POM dependencies

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5362:

Labels: BB2015-05-TBR  (was: )

 clean up POM dependencies
 -

 Key: MAPREDUCE-5362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5362.patch, mr-5362-0.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6020:

Labels: BB2015-05-TBR  (was: )

 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress
 -

 Key: MAPREDUCE-6020
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.10
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6020.branch1.patch


 Too many threads blocking on the global JobTracker lock from getJobCounters, 
 optimize getJobCounters to release global JobTracker lock before access the 
 per job counter in JobInProgress. It may be a lot of JobClients to call 
 getJobCounters in JobTracker at the same time, Current code will lock the 
 JobTracker to block all the threads to get counter from JobInProgress. It is 
 better to unlock the JobTracker when get counter from 
 JobInProgress(job.getCounters(counters)). So all the theads can run parallel 
 when access its own job counter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5889) Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String)

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5889:

Labels: BB2015-05-TBR newbie  (was: newbie)

 Deprecate FileInputFormat.setInputPaths(Job, String) and 
 FileInputFormat.addInputPaths(Job, String)
 ---

 Key: MAPREDUCE-5889
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5889
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: BB2015-05-TBR, newbie
 Attachments: MAPREDUCE-5889.3.patch, MAPREDUCE-5889.patch, 
 MAPREDUCE-5889.patch


 {{FileInputFormat.setInputPaths(Job job, String commaSeparatedPaths)}} and 
 {{FileInputFormat.addInputPaths(Job job, String commaSeparatedPaths)}} fail 
 to parse commaSeparatedPaths if a comma is included in the file path. (e.g. 
 Path: {{/path/file,with,comma}})
 We should deprecate these methods and document to use {{setInputPaths(Job 
 job, Path... inputPaths)}} and {{addInputPaths(Job job, Path... inputPaths)}} 
 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5929) YARNRunner.java, path for jobJarPath not set correctly

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5929:

Labels: BB2015-05-TBR newbie patch  (was: newbie patch)

 YARNRunner.java, path for jobJarPath not set correctly
 --

 Key: MAPREDUCE-5929
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5929
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Chao Tian
Assignee: Rahul Palamuttam
  Labels: BB2015-05-TBR, newbie, patch
 Attachments: MAPREDUCE-5929.patch


 In YARNRunner.java, line 357,
 Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR));
 This causes the job.jar file to miss scheme, host and port number on 
 distributed file systems other than hdfs. 
 If we compare line 357 with line 344, there job.xml is actually set as
  
 Path jobConfPath = new Path(jobSubmitDir,MRJobConfig.JOB_CONF_FILE);
 It appears jobSubmitDir is missing on line 357, which causes this problem. 
 In hdfs, the additional qualify process will correct this problem, but not 
 other generic distributed file systems.
 The proposed change is to replace 35 7 with
 Path jobJarPath = new Path(jobConf.get(jobSubmitDir,MRJobConfig.JAR));
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6038:

Labels: BB2015-05-TBR  (was: )

 A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
 ---

 Key: MAPREDUCE-6038
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: java version 1.8.0_11 hostspot 64-bit
Reporter: Pei Ma
Assignee: Tsuyoshi Ozawa
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6038.1.patch


 As a beginner, when I learned about the basic of the mr, I found that I 
 cound't run the WordCount2 using the command bin/hadoop jar wc.jar 
 WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output in the 
 Tutorial. The VM throwed the NullPoniterException at the line 47. In the line 
 45, the returned default value of conf.getBoolean is true. That is to say  
 when wordcount.skip.patterns is not set ,the WordCount2 will continue to 
 execute getCacheFiles.. Then patternsURIs gets the null value. When the 
 -skip option dosen't exist,  wordcount.skip.patterns will not be set. 
 Then a NullPointerException come out.
 At all, the block after the if-statement in line no. 45 shoudn't be executed 
 when the -skip option dosen't exist in command. Maybe the line 45 should 
 like that  if (conf.getBoolean(wordcount.skip.patterns, false)) { 
 .Just change the boolean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5817:

Labels: BB2015-05-TBR  (was: )

 mappers get rescheduled on node transition even after all reducers are 
 completed
 

 Key: MAPREDUCE-5817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
  Labels: BB2015-05-TBR
 Attachments: mapreduce-5817.patch


 We're seeing a behavior where a job runs long after all reducers were already 
 finished. We found that the job was rescheduling and running a number of 
 mappers beyond the point of reducer completion. In one situation, the job ran 
 for some 9 more hours after all reducers completed!
 This happens because whenever a node transition (to an unusable state) comes 
 into the app master, it just reschedules all mappers that already ran on the 
 node in all cases.
 Therefore, if any node transition has a potential to extend the job period. 
 Once this window opens, another node transition can prolong it, and this can 
 happen indefinitely in theory.
 If there is some instability in the pool (unhealthy, etc.) for a duration, 
 then any big job is severely vulnerable to this problem.
 If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
 reschedule mapper tasks. If all reducers are completed, the mapper outputs 
 are no longer needed, and there is no need to reschedule mapper tasks as they 
 would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5490) MapReduce doesn't set the environment variable for children processes

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5490:

Labels: BB2015-05-TBR  (was: )

 MapReduce doesn't set the environment variable for children processes
 -

 Key: MAPREDUCE-5490
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5490
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Owen O'Malley
Assignee: Owen O'Malley
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5490.patch, mr-5490.patch, mr-5490.patch


 Currently, MapReduce uses the command line argument to pass the classpath to 
 the child. This breaks if the process forks a child that needs the same 
 classpath. Such a case happens in Hive when it uses map-side joins. I propose 
 that we make MapReduce in branch-1 use the CLASSPATH environment variable 
 like YARN does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5499) Fix synchronization issues of the setters/getters of *PBImpl which take in/return lists

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5499:

Labels: BB2015-05-TBR  (was: )

 Fix synchronization issues of the setters/getters of *PBImpl which take 
 in/return lists
 ---

 Key: MAPREDUCE-5499
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5499
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Xuan Gong
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5499.1.patch, MAPREDUCE-5499.2.patch


 Similar to YARN-609. There're the following *PBImpls which need to be fixed:
 1. GetDiagnosticsResponsePBImpl
 2. GetTaskAttemptCompletionEventsResponsePBImpl
 3. GetTaskReportsResposnePBImpl
 4. CounterGroupPBImpl
 5. JobReportPBImpl
 6. TaskReportPBImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5392) mapred job -history all command throws IndexOutOfBoundsException

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5392:

Labels: BB2015-05-TBR  (was: )

 mapred job -history all command throws IndexOutOfBoundsException
 --

 Key: MAPREDUCE-5392
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5392.2.patch, MAPREDUCE-5392.3.patch, 
 MAPREDUCE-5392.4.patch, MAPREDUCE-5392.5.patch, MAPREDUCE-5392.patch, 
 MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
 MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
 MAPREDUCE-5392.patch, MAPREDUCE-5392.patch


 When I use an all option by mapred job -history comamnd, the following 
 exceptions are displayed and do not work.
 {code}
 Exception in thread main java.lang.StringIndexOutOfBoundsException: String 
 index out of range: -3
 at java.lang.String.substring(String.java:1875)
 at 
 org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117)
 at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233)
 {code}
 This is because a node name recorded in History file is not given tracker_. 
 Therefore it makes modifications to be able to read History file even if a 
 node name is not given by tracker_.
 In addition, it fixes the URL of displayed task log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4065) Add .proto files to built tarball

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4065:

Labels: BB2015-05-TBR  (was: )

 Add .proto files to built tarball
 -

 Key: MAPREDUCE-4065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.2, 2.4.0
Reporter: Ralph H Castain
Assignee: Tsuyoshi Ozawa
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4065.1.patch


 Please add the .proto files to the built tarball so that users can build 3rd 
 party tools that use protocol buffers without having to do an svn checkout of 
 the source code.
 Sorry I don't know more about Maven, or I would provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6030) In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6030:

Labels: BB2015-05-TBR  (was: )

 In mr-jobhistory-daemon.sh, some env variables are not affected by 
 mapred-env.sh
 

 Key: MAPREDUCE-6030
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6030
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.4.1
Reporter: Youngjoon Kim
Assignee: Youngjoon Kim
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6030.patch


 In mr-jobhistory-daemon.sh, some env variables are exported before sourcing 
 mapred-env.sh, so these variables don't use values defined in mapred-env.sh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6040) distcp should automatically use /.reserved/raw when run by the superuser

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6040:

Labels: BB2015-05-TBR  (was: )

 distcp should automatically use /.reserved/raw when run by the superuser
 

 Key: MAPREDUCE-6040
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6040
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Charles Lamb
  Labels: BB2015-05-TBR
 Attachments: HDFS-6134-Distcp-cp-UseCasesTable2.pdf, 
 MAPREDUCE-6040.001.patch, MAPREDUCE-6040.002.patch


 On HDFS-6134, [~sanjay.radia] asked for distcp to automatically prepend 
 /.reserved/raw if the distcp is being performed by the superuser and 
 /.reserved/raw is supported by both the source and destination filesystems. 
 This behavior only occurs if none of the src and target pathnames are 
 /.reserved/raw.
 The -disablereservedraw flag can be used to disable this option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs

2015-05-05 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529508#comment-14529508
 ] 

Naganarasimha G R commented on MAPREDUCE-6304:
--

Thanks [~Wangda] for your comments,
+1 for {{mention in description that, by default the node-label-expression for 
job is not set, it will use queue's default-node-label-expression.}}. I am 
getting it tested in cluster setup, will upload the updated patch today.

 Specifying node labels when submitting MR jobs
 --

 Key: MAPREDUCE-6304
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jian Fang
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6304.20150410-1.patch, 
 MAPREDUCE-6304.20150411-1.patch, MAPREDUCE-6304.20150501-1.patch


 Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify 
 node labels when submitting MR jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6068) Illegal progress value warnings in map tasks

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6068:

Labels: BB2015-05-TBR  (was: )

 Illegal progress value warnings in map tasks
 

 Key: MAPREDUCE-6068
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6068
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, task
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Assignee: Binglin Chang
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6068.002.patch, MAPREDUCE-6068.v1.patch


 When running a terasort on latest trunk, I see the following in my task logs:
 {code}
 2014-09-02 17:42:28,437 INFO [main] org.apache.hadoop.mapred.MapTask: Map 
 output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal 
 progress value found, progress is larger than 1. Progress will be changed to 1
 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal 
 progress value found, progress is larger than 1. Progress will be changed to 1
 2014-09-02 17:42:42,241 INFO [main] org.apache.hadoop.mapred.MapTask: 
 Starting flush of map output
 {code}
 We should eliminate these warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6315:

Labels: BB2015-05-TBR  (was: )

 Implement retrieval of logs for crashed MR-AM via jhist in the staging 
 directory
 

 Key: MAPREDUCE-6315
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mr-am
Affects Versions: 2.7.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6315.001.patch


 When all AM attempts crash, there is no record of them in JHS. Thus no easy 
 way to get the logs. This JIRA automates the procedure by utilizing the jhist 
 file in the staging directory. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6246:

Labels: BB2015-05-TBR DB2 mapreduce  (was: DB2 mapreduce)

 DBOutputFormat.java appending extra semicolon to query which is incompatible 
 with DB2
 -

 Key: MAPREDUCE-6246
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.4.1
 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
 Platform: xSeries, pSeries
 Browser: Firefox, IE
 Security Settings: No Security, Flat file, LDAP, PAM
 File System: HDFS, GPFS FPO
Reporter: ramtin
Assignee: ramtin
  Labels: BB2015-05-TBR, DB2, mapreduce
 Attachments: MAPREDUCE-6246.002.patch, MAPREDUCE-6246.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 DBoutputformat is used for writing output of mapreduce jobs to the database 
 and when used with db2 jdbc drivers it fails with following error
 com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
 SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
 DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
 com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)
 In DBOutputFormat class there is constructQuery method that generates INSERT 
 INTO statement with semicolon(;) at the end.
 Semicolon is ANSI SQL-92 standard character for a statement terminator but 
 this feature is disabled(OFF) as a default settings in IBM DB2.
 Although by using -t we can turn it ON for db2. 
 (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
  But there are some products that already built on top of this default 
 setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6316) Task Attempt List entries should link to the task overview

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6316:

Labels: BB2015-05-TBR  (was: )

 Task Attempt List entries should link to the task overview
 --

 Key: MAPREDUCE-6316
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6316
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: AM attempt page.png, AM task page.png, All Attempts 
 page.png, MAPREDUCE-6316.v1.patch, MAPREDUCE-6316.v2.patch, 
 MAPREDUCE-6316.v3.patch, Task Overview page.png


 Typical workflow is to click on the list of failed attempts. Then you want to 
 look at the counters, or the list of attempts of just one task in general. If 
 each entry task attempt id linked the task id portion of it back to the task, 
 we would not have to go through the list of tasks to search for the task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5465) Container killed before hprof dumps profile.out

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5465:

Labels: BB2015-05-TBR  (was: )

 Container killed before hprof dumps profile.out
 ---

 Key: MAPREDUCE-5465
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Radim Kolar
Assignee: Ming Ma
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, 
 MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, 
 MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, 
 MAPREDUCE-5465.patch


 If there is profiling enabled for mapper or reducer then hprof dumps 
 profile.out at process exit. It is dumped after task signaled to AM that work 
 is finished.
 AM kills container with finished work without waiting for hprof to finish 
 dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 
 works) , it could not finish dump in time before being killed making entire 
 dump unusable because cpu and heap stats are missing.
 There needs to be better delay before container is killed if profiling is 
 enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6305:

Labels: BB2015-05-TBR  (was: )

 AM/Task log page should be able to link back to the job
 ---

 Key: MAPREDUCE-6305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6305.v1.patch, MAPREDUCE-6305.v2.patch, 
 MAPREDUCE-6305.v3.patch, MAPREDUCE-6305.v4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6241) Native compilation fails for Checksum.cc due to an incompatibility of assembler register constraint for PowerPC

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6241:

Labels: BB2015-05-TBR features  (was: features)

 Native compilation fails for Checksum.cc due to an  incompatibility of 
 assembler register constraint for PowerPC
 

 Key: MAPREDUCE-6241
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6241
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 3.0.0, 2.6.0
 Environment: Debian/Jessie, kernel 3.18.5,  ppc64 GNU/Linux
 gcc (Debian 4.9.1-19)
 protobuf 2.6.1
 OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-2)
 OpenJDK Zero VM (build 24.65-b04, interpreted mode)
 source was cloned (and updated) from Apache-Hadoop's git repository 
Reporter: Stephan Drescher
Assignee: Binglin Chang
Priority: Minor
  Labels: BB2015-05-TBR, features
 Attachments: MAPREDUCE-6241.001.patch, MAPREDUCE-6241.002.patch


 Issue when using assembler code for performance optimization on the powerpc 
 platform (compiled for 32bit)
 mvn compile -Pnative -DskipTests
 [exec] /usr/bin/c++   -Dnativetask_EXPORTS -m32  -DSIMPLE_MEMCPY 
 -fno-strict-aliasing -Wall -Wno-sign-compare -g -O2 -DNDEBUG -fPIC 
 -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native/javah
  
 -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src
  
 -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util
  
 -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib
  
 -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test
  
 -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src
  
 -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native
  -I/home/hadoop/Java/java7/include -I/home/hadoop/Java/java7/include/linux 
 -isystem 
 /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/gtest/include
 -o CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o -c 
 /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc
  [exec] CMakeFiles/nativetask.dir/build.make:744: recipe for target 
 'CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o' failed
  [exec] make[2]: Leaving directory 
 '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native'
  [exec] CMakeFiles/Makefile2:95: recipe for target 
 'CMakeFiles/nativetask.dir/all' failed
  [exec] make[1]: Leaving directory 
 '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native'
  [exec] Makefile:76: recipe for target 'all' failed
  [exec] 
 /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc:
  In function ‘void NativeTask::init_cpu_support_flag()’:
 /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc:611:14:
  error: impossible register constraint in ‘asm’
 --
 popl %%ebx : =a (eax), [ebx] =r(ebx), =c(ecx), =d(edx) : a 
 (eax_in) : cc);
 --



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6336:

Labels: BB2015-05-TBR  (was: )

 Enable v2 FileOutputCommitter by default
 

 Key: MAPREDUCE-6336
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.7.0
Reporter: Gera Shegalov
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6336.v1.patch


 This JIRA is to propose making new FileOutputCommitter behavior from 
 MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6269) improve JobConf to add option to not share Credentials between jobs.

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6269:

Labels: BB2015-05-TBR  (was: )

 improve JobConf to add option to not share Credentials between jobs.
 

 Key: MAPREDUCE-6269
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6269
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6269.000.patch


 Improve JobConf to add constructor to avoid sharing Credentials between jobs.
 By default the Credentials will be shared to keep the backward compatibility.
 We can add a new constructor with a new parameter to decide whether to share 
 Credentials. Some issues reported in cascading is due to corrupted credentials
 at
 https://github.com/Cascading/cascading/commit/45b33bb864172486ac43782a4d13329312d01c0e
 If we add this support in JobConf, it will benefit all job clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6298) Job#toString throws an exception when not in state RUNNING

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6298:

Labels: BB2015-05-TBR  (was: )

 Job#toString throws an exception when not in state RUNNING
 --

 Key: MAPREDUCE-6298
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6298
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6298.1.patch


 Job#toString calls {{ensureState(JobState.RUNNING);}} as the very first 
 thing. That method causes an Exception to be thrown which is not nice.
 One thing this breaks is usage of Job on the Scala (e.g. Spark) REPL as that 
 calls toString after every invocation and that fails every time.
 I'll attach a patch that checks state and if it's RUNNING prints the original 
 message and if not prints something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6356) Misspelling of threshold in log4j.properties for tests

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6356:

Labels: BB2015-05-TBR  (was: )

 Misspelling of threshold in log4j.properties for tests
 --

 Key: MAPREDUCE-6356
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6356
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6356.patch


 log4j.properties file for test contains misspelling log4j.threshhold.
 We should use log4j.threshold correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-2094:

Labels: BB2015-05-TBR  (was: )

 org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements 
 unsafe default behaviour that is different from the documented behaviour.
 ---

 Key: MAPREDUCE-2094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Niels Basjes
Assignee: Niels Basjes
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-2094-2011-05-19.patch, 
 MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, 
 MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, 
 MAPREDUCE-2094-2015-05-05-2328.patch, 
 MAPREDUCE-2094-FileInputFormat-docs-v2.patch


 When implementing a custom derivative of FileInputFormat we ran into the 
 effect that a large Gzipped input file would be processed several times. 
 A near 1GiB file would be processed around 36 times in its entirety. Thus 
 producing garbage results and taking up a lot more CPU time than needed.
 It took a while to figure out and what we found is that the default 
 implementation of the isSplittable method in 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
 http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup
  ] is simply return true;. 
 This is a very unsafe default and is in contradiction with the JavaDoc of the 
 method which states: Is the given filename splitable? Usually, true, but if 
 the file is stream compressed, it will not be.  . The actual implementation 
 effectively does Is the given filename splitable? Always true, even if the 
 file is stream compressed using an unsplittable compression codec. 
 For our situation (where we always have Gzipped input) we took the easy way 
 out and simply implemented an isSplittable in our class that does return 
 false; 
 Now there are essentially 3 ways I can think of for fixing this (in order of 
 what I would find preferable):
 # Implement something that looks at the used compression of the file (i.e. do 
 migrate the implementation from TextInputFormat to FileInputFormat). This 
 would make the method do what the JavaDoc describes.
 # Force developers to think about it and make this method abstract.
 # Use a safe default (i.e. return false)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6279) AM should explicity exit JVM after all services have stopped

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6279:

Labels: BB2015-05-TBR  (was: )

 AM should explicity exit JVM after all services have stopped
 

 Key: MAPREDUCE-6279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Eric Payne
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6279.v1.txt, MAPREDUCE-6279.v2.txt, 
 MAPREDUCE-6279.v3.patch, MAPREDUCE-6279.v4.patch


 Occasionally the MapReduce AM can get stuck trying to shut down.  
 MAPREDUCE-6049 and MAPREDUCE-5888 were specific instances that have been 
 fixed, but this can also occur with uber jobs if the task code inadvertently 
 leaves non-daemon threads lingering.
 We should explicitly shutdown the JVM after the MapReduce AM has unregistered 
 and all services have been stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6174:

Labels: BB2015-05-TBR  (was: )

 Combine common stream code into parent class for InMemoryMapOutput and 
 OnDiskMapOutput.
 ---

 Key: MAPREDUCE-6174
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.0.0, 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, 
 MAPREDUCE-6174.v1.txt


 Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing 
 similar things with regards to IFile streams.
 In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are 
 different from 3rd-party implementations, this JIRA will make them subclass a 
 common class (see 
 https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5747) Potential null pointer deference in HsTasksBlock#render()

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5747:

Labels: BB2015-05-TBR newbie patch  (was: newbie patch)

 Potential null pointer deference in HsTasksBlock#render()
 -

 Key: MAPREDUCE-5747
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5747
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
  Labels: BB2015-05-TBR, newbie, patch
 Attachments: MAPREDUCE-5747-1.patch


 At line 140:
 {code}
 } else {
   ta = new TaskAttemptInfo(successful, type, false);
 {code}
 There is no check for type against null.
 TaskAttemptInfo ctor deferences type:
 {code}
   public TaskAttemptInfo(TaskAttempt ta, TaskType type, Boolean isRunning) {
 final TaskAttemptReport report = ta.getReport();
 this.type = type.toString();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6337) add a mode to replay MR job history files to the timeline service

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6337:

Labels: BB2015-05-TBR  (was: )

 add a mode to replay MR job history files to the timeline service
 -

 Key: MAPREDUCE-6337
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6337
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Sangjin Lee
Assignee: Sangjin Lee
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6337-YARN-2928.001.patch


 The subtask covers the work on top of YARN-3437 to add a mode to replay MR 
 job history files to the timeline service storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6079) Renaming JobImpl#username to reporterUserName

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6079:

Labels: BB2015-05-TBR  (was: )

 Renaming JobImpl#username to reporterUserName
 -

 Key: MAPREDUCE-6079
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6079
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6079.1.patch


 On MAPREDUCE-6033, we found the bug because of confusing field names 
 {{userName}} and {{username}}. We should change the names to distinguish them 
 easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-05 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Status: Patch Available  (was: Open)

 JobClient needs additional retries at a higher level to address 
 not-immediately-consistent dfs corner cases
 ---

 Key: MAPREDUCE-6251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.6.0
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
 MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch


 The JobClient is used to get job status information for running and completed 
 jobs.  Final state and history for a job is communicated from the application 
 master to the job history server via a distributed file system - where the 
 history is uploaded by the application master to the dfs and then 
 scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
 guarantees not all Hadoop DFS's do.  When used in conjunction with a 
 distributed file system which does not have this guarantee there will be 
 cases where the history server may not see an uploaded file, resulting in the 
 dreaded no such job and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases

2015-05-05 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated MAPREDUCE-6251:
---
Attachment: MAPREDUCE-6251.4.patch

Updated with recommended move to MRJobConfig

 JobClient needs additional retries at a higher level to address 
 not-immediately-consistent dfs corner cases
 ---

 Key: MAPREDUCE-6251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.6.0
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, 
 MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch


 The JobClient is used to get job status information for running and completed 
 jobs.  Final state and history for a job is communicated from the application 
 master to the job history server via a distributed file system - where the 
 history is uploaded by the application master to the dfs and then 
 scanned/loaded by the jobhistory server.  While HDFS has strong consistency 
 guarantees not all Hadoop DFS's do.  When used in conjunction with a 
 distributed file system which does not have this guarantee there will be 
 cases where the history server may not see an uploaded file, resulting in the 
 dreaded no such job and a null value for the RunningJob in the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6320) Configuration of retrieved Job via Cluster is not properly set-up

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6320:

Labels: BB2015-05-TBR  (was: )

 Configuration of retrieved Job via Cluster is not properly set-up
 -

 Key: MAPREDUCE-6320
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6320
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jens Rabe
Assignee: Jens Rabe
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6320.001.patch, MAPREDUCE-6320.002.patch, 
 MAPREDUCE-6320.003.patch


 When getting a Job via the Cluster API, it is not correctly configured.
 To reproduce this:
 # Submit a MR job, and set some arbitrary parameter to its configuration
 {code:java}
 job.getConfiguration().set(foo, bar);
 job.setJobName(foo-bug-demo);
 {code}
 # Get the job in a client:
 {code:java}
 final Cluster c = new Cluster(conf);
 final JobStatus[] statuses = c.getAllJobStatuses();
 final JobStatus s = ... // get the status for the job named foo-bug-demo
 final Job j = c.getJob(s.getJobId());
 final Configuration conf = job.getConfiguration();
 {code}
 # Get its foo entry
 {code:java}
 final String s = conf.get(foo);
 {code}
 # Expected: s is bar; But: s is null.
 The reason is that the job's configuration is stored on HDFS (the 
 Configuration has a resource with a *hdfs://* URL) and in the *loadResource* 
 it is changed to a path on the local file system 
 (hdfs://host.domain:port/tmp/hadoop-yarn/... is changed to 
 /tmp/hadoop-yarn/...), which does not exist, and thus the configuration is 
 not populated.
 The bug happens in the *Cluster* class, where *JobConfs* are created from 
 *status.getJobFile()*. A quick fix would be to copy this job file to a 
 temporary file in the local file system and populate the JobConf from this 
 file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6128) Automatic addition of bundled jars to distributed cache

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6128:

Labels: BB2015-05-TBR  (was: )

 Automatic addition of bundled jars to distributed cache 
 

 Key: MAPREDUCE-6128
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6128
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 2.5.1
Reporter: Gera Shegalov
Assignee: Gera Shegalov
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6128.v01.patch, MAPREDUCE-6128.v02.patch, 
 MAPREDUCE-6128.v03.patch, MAPREDUCE-6128.v04.patch, MAPREDUCE-6128.v05.patch, 
 MAPREDUCE-6128.v06.patch, MAPREDUCE-6128.v07.patch, MAPREDUCE-6128.v08.patch


 On the client side, JDK adds Class-Path elements from the job jar manifest
 on the classpath. In theory there could be many bundled jars in many 
 directories such that adding them manually via libjars or similar means to 
 task classpaths is cumbersome. If this property is enabled, the same jars are 
 added
 to the task classpaths automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4683:

Labels: BB2015-05-TBR  (was: )

 We need to fix our build to create/distribute 
 hadoop-mapreduce-client-core-tests.jar
 

 Key: MAPREDUCE-4683
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Arun C Murthy
Assignee: Akira AJISAKA
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4683.patch


 We need to fix our build to create/distribute 
 hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6310:

Labels: BB2015-05-TBR  (was: )

 Add jdiff support to MapReduce
 --

 Key: MAPREDUCE-6310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
  Labels: BB2015-05-TBR
 Attachments: MAPRED-6310-040615.patch


 Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
 support of jdiff to YARN. Probably we'd like to do similar things with 
 MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6271) org.apache.hadoop.mapreduce.Cluster GetJob() display warn log

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6271:

Labels: BB2015-05-TBR  (was: )

 org.apache.hadoop.mapreduce.Cluster GetJob() display warn log
 -

 Key: MAPREDUCE-6271
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6271
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.7.0
Reporter: Peng Zhang
Assignee: Peng Zhang
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6271.v2.patch, MR-6271.patch


 When using getJob() with MapReduce 2.7, warn log caused by configuration 
 loaded twice is displayed every time. And when job completed, this function 
 will display warn log of java.io.FileNotFoundException
 And I think this is related with MAPREDUCE-5875, the change in GetJob() seems 
 to be not needed, cause it's only for test.
 {noformat}
 15/03/04 13:41:23 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 15/03/04 13:41:23 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 15/03/04 13:41:24 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 15/03/04 13:41:24 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 15/03/04 13:41:25 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 15/03/04 13:41:25 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 15/03/04 13:41:26 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 15/03/04 13:41:26 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 15/03/04 13:41:27 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 15/03/04 13:41:27 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 15/03/04 13:41:28 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 15/03/04 13:41:28 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 15/03/04 13:41:29 WARN conf.Configuration: 
 hdfsG://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 15/03/04 13:41:29 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 15/03/04 13:41:29 INFO exec.Task: 2015-03-04 13:41:29,853 Stage-1 map = 100%, 
  reduce = 0%, Cumulative CPU 2.37 sec
 15/03/04 13:41:30 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 15/03/04 13:41:30 WARN conf.Configuration: 
 hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an
  attempt to override final parameter:

[jira] [Updated] (MAPREDUCE-6296) A better way to deal with InterruptedException on waitForCompletion

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6296:

Labels: BB2015-05-TBR  (was: )

 A better way to deal with InterruptedException on waitForCompletion
 ---

 Key: MAPREDUCE-6296
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6296
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Yang Hao
Assignee: Yang Hao
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6296.patch


 Some code in method waitForCompletion of Job class is 
 {code:title=Job.java|borderStyle=solid}
   public boolean waitForCompletion(boolean verbose
) throws IOException, InterruptedException,
 ClassNotFoundException {
 if (state == JobState.DEFINE) {
   submit();
 }
 if (verbose) {
   monitorAndPrintJob();
 } else {
   // get the completion poll interval from the client.
   int completionPollIntervalMillis = 
 Job.getCompletionPollInterval(cluster.getConf());
   while (!isComplete()) {
 try {
   Thread.sleep(completionPollIntervalMillis);
 } catch (InterruptedException ie) {
 }
   }
 }
 return isSuccessful();
   }
 {code}
 but a better way to deal with InterruptException is
 {code:title=Job.java|borderStyle=solid}
   public boolean waitForCompletion(boolean verbose
) throws IOException, InterruptedException,
 ClassNotFoundException {
 if (state == JobState.DEFINE) {
   submit();
 }
 if (verbose) {
   monitorAndPrintJob();
 } else {
   // get the completion poll interval from the client.
   int completionPollIntervalMillis = 
 Job.getCompletionPollInterval(cluster.getConf());
   while (!isComplete()) {
 try {
   Thread.sleep(completionPollIntervalMillis);
 } catch (InterruptedException ie) {
   Thread.currentThread().interrupt();
 }
   }
 }
 return isSuccessful();
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-3517) map.input.path is null at the first split when use CombieFileInputFormat

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3517:

Labels: BB2015-05-TBR  (was: )

  map.input.path is null at the first split when use CombieFileInputFormat
 ---

 Key: MAPREDUCE-3517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.20.203.0
Reporter: wanbin
  Labels: BB2015-05-TBR
 Attachments: CombineFileRecordReader.diff, MAPREDUCE-3517.02.patch


  map.input.path is null at the first split when use CombieFileInputFormat. 
 because in runNewMapper function, mapContext instead of taskContext which is 
 set map.input.path.  so we need set map.input.path again to mapContext



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5883) Total megabyte-seconds in job counters is slightly misleading

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5883:

Labels: BB2015-05-TBR  (was: )

 Total megabyte-seconds in job counters is slightly misleading
 ---

 Key: MAPREDUCE-5883
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5883
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5883.patch


 The following counters are in milliseconds so megabyte-seconds might be 
 better stated as megabyte-milliseconds
 MB_MILLIS_MAPS.name=   Total megabyte-seconds taken by all map 
 tasks
 MB_MILLIS_REDUCES.name=Total megabyte-seconds taken by all reduce 
 tasks
 VCORES_MILLIS_MAPS.name=   Total vcore-seconds taken by all map tasks
 VCORES_MILLIS_REDUCES.name=Total vcore-seconds taken by all reduce 
 tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6027) mr jobs with relative paths can fail

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6027:

Labels: BB2015-05-TBR  (was: )

 mr jobs with relative paths can fail
 

 Key: MAPREDUCE-6027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Reporter: Wing Yew Poon
Assignee: Wing Yew Poon
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6027.patch


 I built hadoop from branch-2 and tried to run terasort as follows:
 {noformat}
 wypoon$ bin/hadoop jar 
 share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-SNAPSHOT.jar terasort 
 sort-input sort-output
 14/08/07 08:57:55 INFO terasort.TeraSort: starting
 2014-08-07 08:57:56.229 java[36572:1903] Unable to load realm info from 
 SCDynamicStore
 14/08/07 08:57:56 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 14/08/07 08:57:57 INFO input.FileInputFormat: Total input paths to process : 2
 Spent 156ms computing base-splits.
 Spent 2ms computing TeraScheduler splits.
 Computing input splits took 159ms
 Sampling 2 splits of 2
 Making 1 from 10 sampled records
 Computing parititions took 626ms
 Spent 789ms computing partitions.
 14/08/07 08:57:57 INFO client.RMProxy: Connecting to ResourceManager at 
 localhost/127.0.0.1:8032
 14/08/07 08:57:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
 /tmp/hadoop-yarn/staging/wypoon/.staging/job_1407426900134_0001
 java.lang.IllegalArgumentException: Can not create a Path from an empty URI
   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:140)
   at org.apache.hadoop.fs.Path.init(Path.java:192)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.checkPermissionOfOther(ClientDistributedCacheManager.java:275)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.ancestorsHaveExecutePermissions(ClientDistributedCacheManager.java:256)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.isPublic(ClientDistributedCacheManager.java:243)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineCacheVisibilities(ClientDistributedCacheManager.java:162)
   at 
 org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:58)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
   at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:316)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
   at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}
 If I used absolute paths for the input and out directories, the job runs fine.
 This breakage is due to HADOOP-10876.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5876) SequenceFileRecordReader NPE if close() is called before initialize()

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5876:

Labels: BB2015-05-TBR  (was: )

 SequenceFileRecordReader NPE if close() is called before initialize()
 -

 Key: MAPREDUCE-5876
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5876
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.3.0, 2.4.0
Reporter: Reinis Vicups
Assignee: Tsuyoshi Ozawa
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5876.1.patch


 org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader extends 
 org.apache.hadoop.mapreduce.RecordReader which in turn implements 
 java.io.Closeable.
 According to java spec the java.io.Closeable#close() has to be idempotent 
 (http://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html) which is 
 not.
 An NPE is being thrown if close() method is invoked without previously 
 calling initialize() method. This happens because SequenceFile.Reader in is 
 null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6003) Resource Estimator suggests huge map output in some cases

2015-05-05 Thread Allen Wittenauer (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer updated MAPREDUCE-6003:

Labels: BB2015-05-TBR (was: )

Resource Estimator suggests huge map output in some cases
-

Key: MAPREDUCE-6003
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6003
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: jobtracker
Affects Versions: 1.2.1
Reporter: Chengbing Liu
Assignee: Chengbing Liu
Labels: BB2015-05-TBR
Attachments: MAPREDUCE-6003-branch-1.2.patch

In some cases, ResourceEstimator can return way too large map output
estimation. This happens when input size is not correctly calculated.
A typical case is when joining two Hive tables (one in HDFS and the other in
HBase). The maps that process the HBase table finish first, which has a 0
length of inputs due to its TableInputFormat. Then for a map that processes
HDFS table, the estimated output size is very large because of the wrong
input size, causing the map task not possible to be assigned.
There are two possible solutions to this problem:
(1) Make input size correct for each case, e.g. HBase, etc.
(2) Use another algorithm to estimate the map output, or at least make it
closer to reality.
I prefer the second way, since the first would require all possibilities to
be taken care of. It is not easy for some inputs such as URIs.
In my opinion, we could make a second estimation which is independent of the
input size:
estimationB = (completedMapOutputSize / completedMaps) * totalMaps * 10
Here, multiplying by 10 makes the estimation more conservative, so that it
will be less likely to assign it to some where not big enough.
The former estimation goes like this:
estimationA = (inputSize * completedMapOutputSize * 2.0) /
completedMapInputSize
My suggestion is to take minimum of the two estimations:
estimation = min(estimationA, estimationB)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-3182) loadgen ignores -m command line when writing random data

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3182:

Labels: BB2015-05-TBR  (was: )

 loadgen ignores -m command line when writing random data
 

 Key: MAPREDUCE-3182
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3182
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 0.23.0, 2.3.0
Reporter: Jonathan Eagles
Assignee: Chen He
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-3182.patch


 If no input directories are specified, loadgen goes into a special mode where 
 random data is generated and written. In that mode, setting the number of 
 mappers (-m command line option) is overridden by a calculation. Instead, it 
 should take into consideration the user specified number of mappers and fall 
 back to the calculation. In addition, update the documentation as well to 
 match the new behavior in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-1380) Adaptive Scheduler

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-1380:

Labels: BB2015-05-TBR  (was: )

 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Jordà Polo
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-1380-branch-1.2.patch, 
 MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5845) TestShuffleHandler failing intermittently on windows

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5845:

Labels: BB2015-05-TBR  (was: )

 TestShuffleHandler failing intermittently on windows
 

 Key: MAPREDUCE-5845
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5845
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Varun Vasudev
Assignee: Varun Vasudev
  Labels: BB2015-05-TBR
 Attachments: apache-mapreduce-5845.0.patch


 TestShuffleHandler fails intermittently on Windows - specifically, 
 testClientClosesConnection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5225) SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5225:

Labels: BB2015-05-TBR  (was: )

 SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits
 ---

 Key: MAPREDUCE-5225
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5225
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5225.1.patch


 Now, SplitSampler only samples the first maxSplitsSampled splits, caused by 
 MAPREDUCE-1820. However, jumping around all splits is in general preferable 
 than the first N splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4216:

Labels: BB2015-05-TBR Output  (was: Output)

 Make MultipleOutputs generic to support non-file output formats
 ---

 Key: MAPREDUCE-4216
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 1.0.2
Reporter: Robbie Strickland
  Labels: BB2015-05-TBR, Output
 Attachments: MAPREDUCE-4216.patch


 The current MultipleOutputs implementation is tied to FileOutputFormat in 
 such a way that it is not extensible to other types of output. It should be 
 made more generic, such as with an interface that can be implemented for 
 different outputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4840:

Labels: BB2015-05-TBR  (was: )

 Delete dead code and deprecate public API related to skipping bad records
 -

 Key: MAPREDUCE-4840
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Mostafa Elhemali
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4840.patch


 It looks like the decision was made in MAPREDUCE-1932 to remove support for 
 skipping bad records rather than fix it (it doesn't work right now in trunk). 
 If that's the case then we should probably delete all the dead code related 
 to it and deprecate the public API's for it right?
 Dead code I'm talking about:
 1. Task class: skipping, skipRanges, writeSkipRecs
 2. MapTask class:  SkippingRecordReader inner class
 3. ReduceTask class: SkippingReduceValuesIterator inner class
 4. Tests: TestBadRecords
 Public API:
 1. SkipBadRecords class



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 >

1 - 100 of 220 matches

Mail list logo