[jira] [Commented] (MAPREDUCE-6259) IllegalArgumentException due to missing job submit time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528323#comment-14528323 ] Hudson commented on MAPREDUCE-6259: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/918/]) MAPREDUCE-6259. IllegalArgumentException due to missing job submit time. Contributed by zhihai xu (jlowe: rev bf70c5ae2824a9139c1aa9d7c14020018881cec2) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/AMStartedEvent.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java IllegalArgumentException due to missing job submit time --- Key: MAPREDUCE-6259 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6259 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.1 Attachments: MAPREDUCE-6259.000.patch -1 job submit time cause IllegalArgumentException when parse the Job history file name and JOB_INIT_FAILED cause -1 job submit time in JobIndexInfo. We found the following job history file name which cause IllegalArgumentException when parse the job status in the job history file name. {code} job_1418398645407_115853--1-worun-kafka%2Dto%2Dhdfs%5Btwo%5D%5B15+topic%28s%29%5D-1423572836007-0-0-FAILED-root.journaling-1423572836007.jhist {code} The stack trace for the IllegalArgumentException is {code} 2015-02-10 04:54:01,863 WARN org.apache.hadoop.mapreduce.v2.hs.PartialJob: Exception while parsing job state. Defaulting to KILLED java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.v2.api.records.JobState.0 at java.lang.Enum.valueOf(Enum.java:236) at org.apache.hadoop.mapreduce.v2.api.records.JobState.valueOf(JobState.java:21) at org.apache.hadoop.mapreduce.v2.hs.PartialJob.getState(PartialJob.java:82) at org.apache.hadoop.mapreduce.v2.hs.PartialJob.init(PartialJob.java:59) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:159) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getPartialJobs(CachedHistoryStorage.java:173) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getPartialJobs(JobHistory.java:284) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.getJobs(HsWebServices.java:212) at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at
[jira] [Commented] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528329#comment-14528329 ] Hudson commented on MAPREDUCE-5649: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/918/]) MAPREDUCE-5649. Reduce cannot use more than 2G memory for the final merge. Contributed by Gera Shegalov (jlowe: rev 7dc3c1203d1ab14c09d0aaf0869a5bcdfafb0a5a) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java Reduce cannot use more than 2G memory for the final merge -- Key: MAPREDUCE-5649 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: stanley shi Assignee: Gera Shegalov Fix For: 2.8.0 Attachments: MAPREDUCE-5649.001.patch, MAPREDUCE-5649.002.patch, MAPREDUCE-5649.003.patch In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in the finalMerge method: int maxInMemReduce = (int)Math.min( Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE); This means no matter how much memory user has, reducer will not retain more than 2G data in memory before the reduce phase starts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528321#comment-14528321 ] Hudson commented on MAPREDUCE-6165: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #918 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/918/]) MAPREDUCE-6165. [JDK8] TestCombineFileInputFormat failed on JDK8. Contributed by Akira AJISAKA. (ozawa: rev 551615fa13f65ae996bae9c1bacff189539b6557) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java * hadoop-mapreduce-project/CHANGES.txt [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Fix For: 2.8.0 Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-004.patch, MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6259) IllegalArgumentException due to missing job submit time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528289#comment-14528289 ] Hudson commented on MAPREDUCE-6259: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #184 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/184/]) MAPREDUCE-6259. IllegalArgumentException due to missing job submit time. Contributed by zhihai xu (jlowe: rev bf70c5ae2824a9139c1aa9d7c14020018881cec2) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/AMStartedEvent.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * hadoop-mapreduce-project/CHANGES.txt IllegalArgumentException due to missing job submit time --- Key: MAPREDUCE-6259 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6259 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.1 Attachments: MAPREDUCE-6259.000.patch -1 job submit time cause IllegalArgumentException when parse the Job history file name and JOB_INIT_FAILED cause -1 job submit time in JobIndexInfo. We found the following job history file name which cause IllegalArgumentException when parse the job status in the job history file name. {code} job_1418398645407_115853--1-worun-kafka%2Dto%2Dhdfs%5Btwo%5D%5B15+topic%28s%29%5D-1423572836007-0-0-FAILED-root.journaling-1423572836007.jhist {code} The stack trace for the IllegalArgumentException is {code} 2015-02-10 04:54:01,863 WARN org.apache.hadoop.mapreduce.v2.hs.PartialJob: Exception while parsing job state. Defaulting to KILLED java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.v2.api.records.JobState.0 at java.lang.Enum.valueOf(Enum.java:236) at org.apache.hadoop.mapreduce.v2.api.records.JobState.valueOf(JobState.java:21) at org.apache.hadoop.mapreduce.v2.hs.PartialJob.getState(PartialJob.java:82) at org.apache.hadoop.mapreduce.v2.hs.PartialJob.init(PartialJob.java:59) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:159) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getPartialJobs(CachedHistoryStorage.java:173) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getPartialJobs(JobHistory.java:284) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.getJobs(HsWebServices.java:212) at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at
[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528287#comment-14528287 ] Hudson commented on MAPREDUCE-6165: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #184 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/184/]) MAPREDUCE-6165. [JDK8] TestCombineFileInputFormat failed on JDK8. Contributed by Akira AJISAKA. (ozawa: rev 551615fa13f65ae996bae9c1bacff189539b6557) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Fix For: 2.8.0 Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-004.patch, MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6356) Misspelling of threshold in log4j.properties for tests
Brahma Reddy Battula created MAPREDUCE-6356: --- Summary: Misspelling of threshold in log4j.properties for tests Key: MAPREDUCE-6356 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6356 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4070) JobHistoryServer creates /tmp directory with restrictive permissions if the directory doesn't already exist.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4070: Labels: BB2015-05-TBR (was: ) JobHistoryServer creates /tmp directory with restrictive permissions if the directory doesn't already exist. Key: MAPREDUCE-4070 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4070 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Labels: BB2015-05-TBR Attachments: MAPREDUCE-4070.patch Starting up the MapReduce JobhHistoryServer service after a clean install appears to automatically create the /tmp directory on HDFS. However, it is created with 750 permission. Attempting to run MR jobs by other users results in the following permissions exception: {code} org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=EXECUTE, inode=/tmp:yarn:supergroup:drwxr-x--- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205) .. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-1125) SerialUtils.cc: deserializeFloat is out of sync with SerialUtils.hh
[ https://issues.apache.org/jira/browse/MAPREDUCE-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-1125: Labels: BB2015-05-TBR (was: ) SerialUtils.cc: deserializeFloat is out of sync with SerialUtils.hh --- Key: MAPREDUCE-1125 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1125 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 0.21.0 Reporter: Simone Leo Assignee: Simone Leo Labels: BB2015-05-TBR Attachments: MAPREDUCE-1125-2.patch, MAPREDUCE-1125-3.patch {noformat} *** SerialUtils.hh *** float deserializeFloat(InStream stream); *** SerialUtils.cc *** void deserializeFloat(float t, InStream stream) { char buf[sizeof(float)]; stream.read(buf, sizeof(float)); XDR xdrs; xdrmem_create(xdrs, buf, sizeof(float), XDR_DECODE); xdr_float(xdrs, t); } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2638) Create a simple stress test for the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2638: Labels: BB2015-05-TBR (was: ) Create a simple stress test for the fair scheduler -- Key: MAPREDUCE-2638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2638 Project: Hadoop Map/Reduce Issue Type: Test Components: contrib/fair-share Reporter: Tom White Assignee: Tom White Labels: BB2015-05-TBR Attachments: MAPREDUCE-2638.patch, MAPREDUCE-2638.patch This would be a test that runs against a cluster, typically with settings that allow preemption to be exercised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5801) Uber mode's log message is missing a vcore reason
[ https://issues.apache.org/jira/browse/MAPREDUCE-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5801: Labels: BB2015-05-TBR easyfix (was: easyfix) Uber mode's log message is missing a vcore reason - Key: MAPREDUCE-5801 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5801 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Steven Wong Assignee: Steven Wong Priority: Minor Labels: BB2015-05-TBR, easyfix Attachments: MAPREDUCE-5801.patch If a job cannot be run in uber mode because of insufficient vcores, the resulting log message has an empty reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5018: Labels: BB2015-05-TBR (was: ) Support raw binary data with Hadoop streaming - Key: MAPREDUCE-5018 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/streaming Affects Versions: 1.1.2 Reporter: Jay Hacker Assignee: Steven Willis Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5018-branch-1.1.patch, MAPREDUCE-5018.patch, MAPREDUCE-5018.patch, justbytes.jar, mapstream People often have a need to run older programs over many files, and turn to Hadoop streaming as a reliable, performant batch system. There are good reasons for this: 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and it is easy to spin up a cluster in the cloud. 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs. 3. It is reasonably performant: it moves the code to the data, maintaining locality, and scales with the number of nodes. Historically Hadoop is of course oriented toward processing key/value pairs, and so needs to interpret the data passing through it. Unfortunately, this makes it difficult to use Hadoop streaming with programs that don't deal in key/value pairs, or with binary data in general. For example, something as simple as running md5sum to verify the integrity of files will not give the correct result, due to Hadoop's interpretation of the data. There have been several attempts at binary serialization schemes for Hadoop streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed at efficiently encoding key/value pairs, and not passing data through unmodified. Even the RawBytes serialization scheme adds length fields to the data, rendering it not-so-raw. I often have a need to run a Unix filter on files stored in HDFS; currently, the only way I can do this on the raw data is to copy the data out and run the filter on one machine, which is inconvenient, slow, and unreliable. It would be very convenient to run the filter as a map-only job, allowing me to build on existing (well-tested!) building blocks in the Unix tradition instead of reimplementing them as mapreduce programs. However, most existing tools don't know about file splits, and so want to process whole files; and of course many expect raw binary input and output. The solution is to run a map-only job with an InputFormat and OutputFormat that just pass raw bytes and don't split. It turns out to be a little more complicated with streaming; I have attached a patch with the simplest solution I could come up with. I call the format JustBytes (as RawBytes was already taken), and it should be usable with most recent versions of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4071) NPE while executing MRAppMaster shutdown hook
[ https://issues.apache.org/jira/browse/MAPREDUCE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4071: Labels: BB2015-05-TBR (was: ) NPE while executing MRAppMaster shutdown hook - Key: MAPREDUCE-4071 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4071 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Bhallamudi Venkata Siva Kamesh Labels: BB2015-05-TBR Attachments: MAPREDUCE-4071-1.patch, MAPREDUCE-4071-2.patch, MAPREDUCE-4071-2.patch, MAPREDUCE-4071.patch While running the shutdown hook of MRAppMaster, hit NPE {noformat} Exception in thread Thread-1 java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:668) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1004) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6232) Task state is running when all task attempts fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6232: Labels: BB2015-05-TBR (was: ) Task state is running when all task attempts fail - Key: MAPREDUCE-6232 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6232 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 2.6.0 Reporter: Yang Hao Assignee: Yang Hao Labels: BB2015-05-TBR Attachments: MAPREDUCE-6232.patch, MAPREDUCE-6232.v2.patch, TaskImpl.new.png, TaskImpl.normal.png, result.pdf When task attempts fails, the task's state is still running. A clever way is to check the task attempts's state, if none of the attempts is running, then the task state should not be running -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6258) add support to back up JHS files from application master
[ https://issues.apache.org/jira/browse/MAPREDUCE-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6258: Labels: BB2015-05-TBR (was: ) add support to back up JHS files from application master Key: MAPREDUCE-6258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6258 Project: Hadoop Map/Reduce Issue Type: New Feature Components: applicationmaster Affects Versions: 2.4.1 Reporter: Jian Fang Labels: BB2015-05-TBR Attachments: MAPREDUCE-6258.patch In hadoop two, job history files are stored on HDFS with a default retention period of one week. In a cloud environment, these HDFS files are actually stored on the disks of ephemeral instances that could go away once the instances are terminated. Users may want to back up the job history files for issue investigation and performance analysis before and after the cluster is terminated. A centralized backup mechanism could have a scalability issue for big and busy Hadoop clusters where there are probably tens of thousands of jobs every day. As a result, it is preferred to have a distributed way to back up the job history files in this case. To achieve this goal, we could add a new feature to back up the job history files in Application master. More specifically, we could copy the job history files to a backup path when they are moved from the temporary staging directory to the intermediate_done path in application master. Since application masters could run on any slave nodes on a Hadoop cluster, we could achieve a better scalability by backing up the job history files in a distributed fashion. Please be aware, the backup path should be managed by the Hadoop users based on their needs. For example, some Hadoop users may copy the job history files to a cloud storage directly and keep them there forever. While some other users may want to store the job history files on local disks and clean them up from time to time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6208) There should be an input format for MapFiles which can be configured so that only a fraction of the input data is used for the MR process
[ https://issues.apache.org/jira/browse/MAPREDUCE-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6208: Labels: BB2015-05-TBR inputformat mapfile (was: inputformat mapfile) There should be an input format for MapFiles which can be configured so that only a fraction of the input data is used for the MR process - Key: MAPREDUCE-6208 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6208 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Jens Rabe Assignee: Jens Rabe Labels: BB2015-05-TBR, inputformat, mapfile Attachments: MAPREDUCE-6208.001.patch, MAPREDUCE-6208.002.patch Original Estimate: 24h Remaining Estimate: 24h In some cases there are large amounts of data organized in MapFiles, e.g., from previous MapReduce tasks, and only a fraction of the data is to be processed in a MR task. The current approach, as I understand, is to re-organize the data in a suitable partition using folders on HDFS, and only use relevant folders as input paths, and maybe doing some additional filtering in the Map task. However, sometimes the input data cannot be easily partitioned that way. For example, when processing large amounts of measured data where additional data on a time period already in HDFS arrives later. There should be an input format that accepts folders with MapFiles, and there should be an option to specify the input key range so that only fitting InputSplits are generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4961: Labels: BB2015-05-TBR (was: ) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations - Key: MAPREDUCE-4961 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jerry Chen Assignee: Jerry Chen Labels: BB2015-05-TBR Attachments: MAPREDUCE-4961.patch, MAPREDUCE-4961.patch Original Estimate: 72h Remaining Estimate: 72h MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 extends Shuffle to be able to provide different MergeManager implementations. While using these pluggable features, I find that when a map reduce is running locally, a RawKeyValueIterator was returned directly from a static call of Merge.merge, which break the assumption that the Shuffle may provide different merge methods although there is no copy phase for this situation. The use case is when I am implementating a hash-based MergeManager, we don't need sort in map side, while when running the map reduce locally, the hash-based MergeManager will have no chance to be used as it goes directly to Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. So we need to move the code calling Merger.merge from Reduce Task to ShuffleConsumerPlugin implementation, so that the Suffle implementation can decide how to do the merge and return corresponding iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6205) Update the value of the new version properties of the deprecated property mapred.child.java.opts
[ https://issues.apache.org/jira/browse/MAPREDUCE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6205: Labels: BB2015-05-TBR (was: ) Update the value of the new version properties of the deprecated property mapred.child.java.opts -- Key: MAPREDUCE-6205 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6205 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: sam liu Assignee: sam liu Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6205.003.patch, MAPREDUCE-6205.patch, MAPREDUCE-6205.patch In current hadoop code, the old property mapred.child.java.opts is deprecated and its new versions are MRJobConfig.MAP_JAVA_OPTS and MRJobConfig.REDUCE_JAVA_OPTS. However, when user set a value to the deprecated property mapred.child.java.opts, hadoop won't automatically update its new versions properties MRJobConfig.MAP_JAVA_OPTS(mapreduce.map.java.opts) and MRJobConfig.REDUCE_JAVA_OPTS(mapreduce.reduce.java.opts). As hadoop will update the new version properties for many other deprecated properties, we also should support such feature on the old property mapred.child.java.opts, otherwise it might bring some imcompatible issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5915) Pipes ping thread should sleep in intervals to allow for isDone() to be checked
[ https://issues.apache.org/jira/browse/MAPREDUCE-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5915: Labels: BB2015-05-TBR (was: ) Pipes ping thread should sleep in intervals to allow for isDone() to be checked --- Key: MAPREDUCE-5915 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5915 Project: Hadoop Map/Reduce Issue Type: Improvement Components: pipes Reporter: Joe Mudd Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5915.patch The ping() thread sleeps for 5 seconds at a time causing up to a 5 second delay in testing if the job is finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6155) MapFiles are not always correctly detected by SequenceFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-6155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6155: Labels: BB2015-05-TBR (was: ) MapFiles are not always correctly detected by SequenceFileInputFormat - Key: MAPREDUCE-6155 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6155 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jens Rabe Labels: BB2015-05-TBR Attachments: MAPREDUCE-6155.001.patch, MAPREDUCE-6155.002.patch Original Estimate: 2h Remaining Estimate: 2h MapFiles are not correctly detected by SequenceFileInputFormat. This is because the listStatus method only detects a MapFile correctly if the path it checks is a directory - it then replaces it by the path of the data file. This is likely to fail if the data file does not exist, i.e., if the input path is a directory, but does not belong to a MapFile, or if recursion is turned on and the input format comes across a file (not a directory) which is indeed part of a MapFile. The listStatus method should be changed to detect these cases correctly: * if the current candidate is a file and its name is index or data, check if its corresponding other file exists, and if the key types of both files match and if the value type of the index file is LongWritable * If the current candidate is a directory, it is only a MapFile if (and only if) an index and a data file exist, they are both SequenceFiles and their key types match (and the index value type is LongWritable) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3383) Duplicate job.getOutputValueGroupingComparator() in ReduceTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3383: Labels: BB2015-05-TBR (was: ) Duplicate job.getOutputValueGroupingComparator() in ReduceTask -- Key: MAPREDUCE-3383 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3383 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.1 Reporter: Binglin Chang Assignee: Binglin Chang Labels: BB2015-05-TBR Attachments: MAPREDUCE-3383.patch This is probably just a small error by mistake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4710) Add peak memory usage counter for each task
[ https://issues.apache.org/jira/browse/MAPREDUCE-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4710: Labels: BB2015-05-TBR patch (was: patch) Add peak memory usage counter for each task --- Key: MAPREDUCE-4710 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4710 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task Affects Versions: 1.0.2 Reporter: Cindy Li Assignee: Cindy Li Priority: Minor Labels: BB2015-05-TBR, patch Attachments: MAPREDUCE-4710-trunk.patch, mapreduce-4710-v1.0.2.patch, mapreduce-4710.patch, mapreduce4710-v3.patch, mapreduce4710-v6.patch, mapreduce4710.patch Each task has counters PHYSICAL_MEMORY_BYTES and VIRTUAL_MEMORY_BYTES, which are snapshots of memory usage of that task. They are not sufficient for users to understand peak memory usage by that task, e.g. in order to diagnose task failures, tune job parameters or change application design. This new feature will add two more counters for each task: PHYSICAL_MEMORY_BYTES_MAX and VIRTUAL_MEMORY_BYTES_MAX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3384) Add warning message for org.apache.hadoop.mapreduce.lib.reduce.LongSumReducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3384: Labels: BB2015-05-TBR (was: ) Add warning message for org.apache.hadoop.mapreduce.lib.reduce.LongSumReducer - Key: MAPREDUCE-3384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3384 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: JiangKai Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-3384.patch When we call the function reduce() of LongSumReducer,the result may overflow. We should send a warning message to users if overflow occurs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4911) Add node-level aggregation flag feature(setNodeLevelAggregation(boolean)) to JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4911: Labels: BB2015-05-TBR (was: ) Add node-level aggregation flag feature(setNodeLevelAggregation(boolean)) to JobConf Key: MAPREDUCE-4911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4911 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Labels: BB2015-05-TBR Attachments: MAPREDUCE-4911.2.patch, MAPREDUCE-4911.3.patch, MAPREDUCE-4911.patch This JIRA adds node-level aggregation flag feature(setLocalAggregation(boolean)) to JobConf. This task is subtask of MAPREDUCE-4502. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5728) Check NPE for serializer/deserializer in MapTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5728: Labels: BB2015-05-TBR (was: ) Check NPE for serializer/deserializer in MapTask Key: MAPREDUCE-5728 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5728 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.2.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5728-trunk.patch Currently we will get NPE if the serializer/deserializer is not configured correctly. {code} 14/01/14 11:52:35 INFO mapred.JobClient: Task Id : attempt_201401072154_0027_m_02_2, Status : FAILED java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:944) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:740) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:368) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(AccessController.java:362) at javax.security.auth.Subject.doAs(Subject.java:573) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} serializationFactory.getSerializer and serializationFactory.getDeserializer returns NULL in this case. Let's check NPE for serializer/deserializer in MapTask so that we don't get meaningless NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5269) Preemption of Reducer (and Shuffle) via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5269: Labels: BB2015-05-TBR (was: ) Preemption of Reducer (and Shuffle) via checkpointing - Key: MAPREDUCE-5269 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5269 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Labels: BB2015-05-TBR Attachments: MAPREDUCE-5269.2.patch, MAPREDUCE-5269.3.patch, MAPREDUCE-5269.4.patch, MAPREDUCE-5269.5.patch, MAPREDUCE-5269.6.patch, MAPREDUCE-5269.7.patch, MAPREDUCE-5269.patch This patch tracks the changes in the task runtime (shuffle, reducer context, etc.) that are required to implement checkpoint-based preemption of reducer tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5916) The authenticate response is not sent when password is empty (LocalJobRunner)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5916: Labels: BB2015-05-TBR (was: ) The authenticate response is not sent when password is empty (LocalJobRunner) - Key: MAPREDUCE-5916 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5916 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Reporter: Joe Mudd Labels: BB2015-05-TBR Attachments: MAPREDUCE-5916.patch When running in a mode where there are no credentials associated with the pipes submission and the password is empty, the C++ verifyDigestAndRespond() does not respond to the Java side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3385) Add warning message for the overflow in reduce() of org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3385: Labels: BB2015-05-TBR (was: ) Add warning message for the overflow in reduce() of org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer Key: MAPREDUCE-3385 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3385 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: JiangKai Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-3385.patch When we call the function reduce() of IntSumReducer,the result may overflow. We should send a warning message to users if overflow occurs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3047) FileOutputCommitter throws wrong type of exception when calling abortTask() to handle a directory without permission
[ https://issues.apache.org/jira/browse/MAPREDUCE-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3047: Labels: BB2015-05-TBR (was: ) FileOutputCommitter throws wrong type of exception when calling abortTask() to handle a directory without permission Key: MAPREDUCE-3047 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3047 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: JiangKai Priority: Trivial Labels: BB2015-05-TBR Attachments: MAPREDUCE-3047-1.patch, MAPREDUCE-3047-2.patch, MAPREDUCE-3047.patch When FileOutputCommitter calls abortTask() to create a temp directory, if the user has no permission to access the directory, or a file with the same name has existed, of course it will fail, however the system will output the error information into the log file instead of throwing an exception.As a result, when the temp directory is needed later, since the temp directory hasn't been created yet, system will throw an exception to tell user that the temp directory doesn't exist.In my opinion, the exception is not exact and the error infomation will confuse users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5403: Labels: BB2015-05-TBR (was: ) MR changes to accommodate yarn.application.classpath being moved to the server-side --- Key: MAPREDUCE-5403 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.0.5-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Labels: BB2015-05-TBR Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403-2.patch, MAPREDUCE-5403.patch yarn.application.classpath is a confusing property because it is used by MapReduce and not YARN, and MapReduce already has mapreduce.application.classpath, which provides the same functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3807) JobTracker needs fix similar to HDFS-94
[ https://issues.apache.org/jira/browse/MAPREDUCE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3807: Labels: BB2015-05-TBR newbie (was: newbie) JobTracker needs fix similar to HDFS-94 --- Key: MAPREDUCE-3807 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3807 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.0 Reporter: Harsh J Labels: BB2015-05-TBR, newbie Attachments: MAPREDUCE-3807.patch 1.0 JobTracker's jobtracker.jsp page currently shows: {code} h2Cluster Summary (Heap Size is %= StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %/%= StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %)/h2 {code} It could use an improvement same as HDFS-94 to reflect live heap usage more accurately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5188) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5188: Labels: BB2015-05-TBR contrib/raid (was: contrib/raid) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java --- Key: MAPREDUCE-5188 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5188 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 2.0.2-alpha Reporter: junjin Assignee: junjin Priority: Critical Labels: BB2015-05-TBR, contrib/raid Fix For: 2.0.2-alpha Attachments: MAPREDUCE-5188.patch error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java need change xorParityLength in line #379 to rsParityLength since it's for verifying RS_SOURCE type -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5365) Set mapreduce.job.classloader to true by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5365: Labels: BB2015-05-TBR (was: ) Set mapreduce.job.classloader to true by default Key: MAPREDUCE-5365 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5365 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Labels: BB2015-05-TBR Attachments: MAPREDUCE-5365.patch MAPREDUCE-1700 introduced the mapreduce.job.classpath option, which uses a custom classloader to separate system classes from user classes. It seems like there are only rare cases when a user would not want this on, and that it should enabled by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4346) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient
[ https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4346: Labels: BB2015-05-TBR (was: ) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient -- Key: MAPREDUCE-4346 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Labels: BB2015-05-TBR Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch The current implementation for JobTracker.getAllJobs() returns all submitted jobs in any state, in addition to retired jobs. This list can be long and represents an unneeded overhead especially in the case of clients only interested in jobs in specific state(s). It is beneficial to include a refined version where only jobs having specific statuses are returned and retired jobs are optional to include. I'll be uploading an initial patch momentarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4330) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful
[ https://issues.apache.org/jira/browse/MAPREDUCE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4330: Labels: BB2015-05-TBR (was: ) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful --- Key: MAPREDUCE-4330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4330 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.1 Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Labels: BB2015-05-TBR Attachments: MAPREDUCE-4330-20130415.1.patch, MAPREDUCE-4330-20130415.patch, MAPREDUCE-4330-21032013.1.patch, MAPREDUCE-4330-21032013.patch The previously completed attempt is removed from successAttemptCompletionEventNoMap and marked OBSOLETE. After that, if the newly completed attempt is successful then it is added to the successAttemptCompletionEventNoMap. This seems wrong because the newly completed attempt could be failed and thus there is no need to invalidate the successful attempt. One error case would be when a speculative attempt completes with killed/failed after the successful version has completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4273) Make CombineFileInputFormat split result JDK independent
[ https://issues.apache.org/jira/browse/MAPREDUCE-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4273: Labels: BB2015-05-TBR (was: ) Make CombineFileInputFormat split result JDK independent Key: MAPREDUCE-4273 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4273 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.0.3 Reporter: Luke Lu Assignee: Yu Gao Labels: BB2015-05-TBR Attachments: MAPREDUCE-4273-branch1-v2.patch, mapreduce-4273-branch-1.patch, mapreduce-4273-branch-2.patch, mapreduce-4273.patch The split result of CombineFileInputFormat depends on the iteration order of nodeToBlocks and rackToBlocks hash maps, which makes the result HashMap implementation hence JDK dependent. This is manifested as TestCombineFileInputFormat failures on alternative JDKs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5377) JobID is not displayed truly by hadoop job -history command
[ https://issues.apache.org/jira/browse/MAPREDUCE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5377: Labels: BB2015-05-TBR newbie (was: newbie) JobID is not displayed truly by hadoop job -history command - Key: MAPREDUCE-5377 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5377 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: MAPREDUCE-5377.patch JobID output by hadoop job -history command is wrong string. {quote} [hadoop@hadoop hadoop]$ hadoop job -history terasort Hadoop job: 0001_1374260789919_hadoop = Job tracker host name: job job tracker start time: Tue May 18 15:39:51 PDT 1976 User: hadoop JobName: TeraSort JobConf: hdfs://hadoop:8020/hadoop/mapred/staging/hadoop/.staging/job_201307191206_0001/job.xml Submitted At: 19-7-2013 12:06:29 Launched At: 19-7-2013 12:06:30 (0sec) Finished At: 19-7-2013 12:06:44 (14sec) Status: SUCCESS {quote} In this example, it should show job_201307191206_0001 at Hadoop job:, but shows 0001_1374260789919_hadoop. In addition, Job tracker host name and job tracker start time is invalid. This problem can solve by fixing setting of jobId in HistoryViewer(). In addition, it should fix the information of JobTracker at HistoryViewr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5150) Backport 2009 terasort (MAPREDUCE-639) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5150: Labels: BB2015-05-TBR (was: ) Backport 2009 terasort (MAPREDUCE-639) to branch-1 -- Key: MAPREDUCE-5150 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5150 Project: Hadoop Map/Reduce Issue Type: Improvement Components: examples Affects Versions: 1.2.0 Reporter: Gera Shegalov Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5150-branch-1.patch Users evaluate performance of Hadoop clusters using different benchmarks such as TeraSort. However, terasort version in branch-1 is outdated. It works on teragen dataset that cannot exceed 4 billion unique keys and it does not have the fast non-sampling partitioner SimplePartitioner either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3936: Labels: BB2015-05-TBR (was: ) Clients should not enforce counter limits -- Key: MAPREDUCE-3936 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Reporter: Tom White Assignee: Tom White Labels: BB2015-05-TBR Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6251: Labels: BB2015-05-TBR (was: ) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases --- Key: MAPREDUCE-6251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.6.0 Reporter: Craig Welch Assignee: Craig Welch Labels: BB2015-05-TBR Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch The JobClient is used to get job status information for running and completed jobs. Final state and history for a job is communicated from the application master to the job history server via a distributed file system - where the history is uploaded by the application master to the dfs and then scanned/loaded by the jobhistory server. While HDFS has strong consistency guarantees not all Hadoop DFS's do. When used in conjunction with a distributed file system which does not have this guarantee there will be cases where the history server may not see an uploaded file, resulting in the dreaded no such job and a null value for the RunningJob in the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5819) Binary token merge should be done once in TokenCache#obtainTokensForNamenodesInternal()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5819: Labels: BB2015-05-TBR (was: ) Binary token merge should be done once in TokenCache#obtainTokensForNamenodesInternal() --- Key: MAPREDUCE-5819 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5819 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Labels: BB2015-05-TBR Attachments: mapreduce-5819-v1.txt Currently mergeBinaryTokens() is called by every invocation of obtainTokensForNamenodesInternal(FileSystem, Credentials, Configuration) in the loop of obtainTokensForNamenodesInternal(Credentials, Path[], Configuration). This can be simplified so that mergeBinaryTokens() is called only once in obtainTokensForNamenodesInternal(Credentials, Path[], Configuration). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2340) optimize JobInProgress.initTasks()
[ https://issues.apache.org/jira/browse/MAPREDUCE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2340: Labels: BB2015-05-TBR critical-0.22.0 (was: critical-0.22.0) optimize JobInProgress.initTasks() -- Key: MAPREDUCE-2340 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2340 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.20.1, 0.21.0 Reporter: Kang Xiao Labels: BB2015-05-TBR, critical-0.22.0 Attachments: MAPREDUCE-2340.patch, MAPREDUCE-2340.patch, MAPREDUCE-2340.r1.diff JobTracker's hostnameToNodeMap cache can speed up JobInProgress.initTasks() and JobInProgress.createCache() significantly. A test for 1 job with 10 maps on a 2400 cluster shows nearly 10 and 50 times speed up for initTasks() and createCache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5258) Memory Leak while using LocalJobRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5258: Labels: BB2015-05-TBR patch (was: patch) Memory Leak while using LocalJobRunner -- Key: MAPREDUCE-5258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5258 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.2 Reporter: Subroto Sanyal Assignee: skrho Labels: BB2015-05-TBR, patch Fix For: 1.1.3 Attachments: mapreduce-5258 _001.txt, mapreduce-5258.txt Every-time a LocalJobRunner is launched it creates JobTrackerInstrumentation and QueueMetrics. While creating this MetricsSystem ; it registers and adds a Callback to ArrayList which keeps on growing as the DefaultMetricsSystem is Singleton. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6350: Labels: BB2015-05-TBR (was: ) JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Labels: BB2015-05-TBR Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6284) Add a 'task attempt state' to MapReduce Application Master REST API
[ https://issues.apache.org/jira/browse/MAPREDUCE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6284: Labels: BB2015-05-TBR (was: ) Add a 'task attempt state' to MapReduce Application Master REST API --- Key: MAPREDUCE-6284 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6284 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ryu Kobayashi Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6284.1.patch, MAPREDUCE-6284.1.patch, MAPREDUCE-6284.2.patch, MAPREDUCE-6284.3.patch, MAPREDUCE-6284.3.patch It want to 'task attempt state' on the 'App state' similarly REST API. GET http://proxy http address:port/proxy/application _id/ws/v1/mapreduce/jobs/job_id/tasks/task_id/attempts/attempt_id/state PUT http://proxy http address:port/proxy/application _id/ws/v1/mapreduce/jobs/job_id/tasks/task_id/attempts/attempt_id/state -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6338) MR AppMaster does not honor ephemeral port range
[ https://issues.apache.org/jira/browse/MAPREDUCE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6338: Labels: BB2015-05-TBR (was: ) MR AppMaster does not honor ephemeral port range Key: MAPREDUCE-6338 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6338 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 2.6.0 Reporter: Frank Nguyen Assignee: Frank Nguyen Labels: BB2015-05-TBR Attachments: MAPREDUCE-6338.002.patch The MR AppMaster should only use port ranges defined in the yarn.app.mapreduce.am.job.client.port-range property. On initial startup of the MRAppMaster, it does use the port range defined in the property. However, it also opens up a listener on a random ephemeral port. This is not the Jetty listener. It is another listener opened by the MRAppMaster via another thread and is recognized by the RM. Other nodes will try to communicate to it via that random port. With firewall settings on, the MR job will fail because the random port is not opened. This problem has caused others to have all OS ephemeral ports opened to have MR jobs run. This is related to MAPREDUCE-4079 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6332) Add more required API's to MergeManager interface
[ https://issues.apache.org/jira/browse/MAPREDUCE-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6332: Labels: BB2015-05-TBR (was: ) Add more required API's to MergeManager interface -- Key: MAPREDUCE-6332 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6332 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.5.0, 2.6.0, 2.7.0 Reporter: Rohith Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-MAPREDUCE-6332.patch, 0002-MAPREDUCE-6332.patch MR provides ability to the user for plugin custom ShuffleConsumerPlugin using *mapreduce.job.reduce.shuffle.consumer.plugin.class*. When the user is allowed to use this configuration as plugin, user also interest in implementing his own MergeManagerImpl. But now , user is forced to use MR provided MergeManagerImpl instead of custom MergeManagerImpl when user is using shuffle.consumer.plugin class. There should be well defined API's in MergeManager that can be used for any implementation without much effort to user for custom implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter
[ https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5733: Labels: BB2015-05-TBR (was: ) Define and use a constant for property textinputformat.record.delimiter - Key: MAPREDUCE-5733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Gelesh Assignee: Gelesh Priority: Trivial Labels: BB2015-05-TBR Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch Original Estimate: 10m Remaining Estimate: 10m (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5203) Make AM of M/R Use NMClient
[ https://issues.apache.org/jira/browse/MAPREDUCE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5203: Labels: BB2015-05-TBR (was: ) Make AM of M/R Use NMClient --- Key: MAPREDUCE-5203 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5203 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: BB2015-05-TBR Attachments: MAPREDUCE-5203.1.patch, MAPREDUCE-5203.2.patch, MAPREDUCE-5203.3.patch, MAPREDUCE-5203.4.patch, MAPREDUCE-5203.5.patch YARN-422 adds NMClient. AM of mapreduce should use it instead of using the raw ContainerManager proxy directly. ContainerLauncherImpl needs to be changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2632: Labels: BB2015-05-TBR (was: ) Avoid calling the partitioner when the numReduceTasks is 1. --- Key: MAPREDUCE-2632 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.23.0 Reporter: Ravi Teja Ch N V Assignee: Ravi Teja Ch N V Labels: BB2015-05-TBR Attachments: MAPREDUCE-2632-1.patch, MAPREDUCE-2632.patch We can avoid the call to the partitioner when the number of reducers is 1.This will avoid the unnecessary computations by the partitioner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5374) CombineFileRecordReader does not set map.input.* configuration parameters for first file read
[ https://issues.apache.org/jira/browse/MAPREDUCE-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5374: Labels: BB2015-05-TBR (was: ) CombineFileRecordReader does not set map.input.* configuration parameters for first file read --- Key: MAPREDUCE-5374 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5374 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.0 Reporter: Dave Beech Assignee: Dave Beech Labels: BB2015-05-TBR Attachments: MAPREDUCE-5374.patch, MAPREDUCE-5374.patch The CombineFileRecordReader operates on splits consisting of multiple files. Each time a new record reader is initialised for a chunk, certain parameters are supposed to be set on the configuration object (map.input.file, map.input.start and map.input.length) However, the first reader is initialised in a different way to subsequent ones (i.e. initialize is called by the MapTask directly rather than from inside the record reader class). Because of this, these config parameters are not set properly and are returned as null when you access them from inside a mapper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5981) Log levels of certain MR logs can be changed to DEBUG
[ https://issues.apache.org/jira/browse/MAPREDUCE-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5981: Labels: BB2015-05-TBR (was: ) Log levels of certain MR logs can be changed to DEBUG - Key: MAPREDUCE-5981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5981 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Labels: BB2015-05-TBR Attachments: MAPREDUCE-5981.patch Following map reduce logs can be changed to DEBUG log level. 1. In org.apache.hadoop.mapreduce.task.reduce.Fetcher#copyFromHost(Fetcher.java : 313), the second log is not required to be at info level. This can be moved to debug as a warn log is anyways printed if verifyReply fails. SecureShuffleUtils.verifyReply(replyHash, encHash, shuffleSecretKey); LOG.info(for url=+msgToEncode+ sent hash and received reply); 2. Thread related info need not be printed in logs at INFO level. Below 2 logs can be moved to DEBUG a) In org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl#getHost(ShuffleSchedulerImpl.java : 381), below log can be changed to DEBUG LOG.info(Assigning + host + with + host.getNumKnownMapOutputs() + to + Thread.currentThread().getName()); b) In org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.getMapsForHost(ShuffleSchedulerImpl.java : 411), below log can be changed to DEBUG LOG.info(assigned + includedMaps + of + totalSize + to + host + to + Thread.currentThread().getName()); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5362: Labels: BB2015-05-TBR (was: ) clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Labels: BB2015-05-TBR Attachments: MAPREDUCE-5362.patch, mr-5362-0.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count
[ https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6020: Labels: BB2015-05-TBR (was: ) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress - Key: MAPREDUCE-6020 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.10 Reporter: zhihai xu Assignee: zhihai xu Labels: BB2015-05-TBR Attachments: MAPREDUCE-6020.branch1.patch Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress. It may be a lot of JobClients to call getJobCounters in JobTracker at the same time, Current code will lock the JobTracker to block all the threads to get counter from JobInProgress. It is better to unlock the JobTracker when get counter from JobInProgress(job.getCounters(counters)). So all the theads can run parallel when access its own job counter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5889) Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5889: Labels: BB2015-05-TBR newbie (was: newbie) Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String) --- Key: MAPREDUCE-5889 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5889 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: MAPREDUCE-5889.3.patch, MAPREDUCE-5889.patch, MAPREDUCE-5889.patch {{FileInputFormat.setInputPaths(Job job, String commaSeparatedPaths)}} and {{FileInputFormat.addInputPaths(Job job, String commaSeparatedPaths)}} fail to parse commaSeparatedPaths if a comma is included in the file path. (e.g. Path: {{/path/file,with,comma}}) We should deprecate these methods and document to use {{setInputPaths(Job job, Path... inputPaths)}} and {{addInputPaths(Job job, Path... inputPaths)}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5929) YARNRunner.java, path for jobJarPath not set correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5929: Labels: BB2015-05-TBR newbie patch (was: newbie patch) YARNRunner.java, path for jobJarPath not set correctly -- Key: MAPREDUCE-5929 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5929 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Chao Tian Assignee: Rahul Palamuttam Labels: BB2015-05-TBR, newbie, patch Attachments: MAPREDUCE-5929.patch In YARNRunner.java, line 357, Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR)); This causes the job.jar file to miss scheme, host and port number on distributed file systems other than hdfs. If we compare line 357 with line 344, there job.xml is actually set as Path jobConfPath = new Path(jobSubmitDir,MRJobConfig.JOB_CONF_FILE); It appears jobSubmitDir is missing on line 357, which causes this problem. In hdfs, the additional qualify process will correct this problem, but not other generic distributed file systems. The proposed change is to replace 35 7 with Path jobJarPath = new Path(jobConf.get(jobSubmitDir,MRJobConfig.JAR)); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
[ https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6038: Labels: BB2015-05-TBR (was: ) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial --- Key: MAPREDUCE-6038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038 Project: Hadoop Map/Reduce Issue Type: Bug Environment: java version 1.8.0_11 hostspot 64-bit Reporter: Pei Ma Assignee: Tsuyoshi Ozawa Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6038.1.patch As a beginner, when I learned about the basic of the mr, I found that I cound't run the WordCount2 using the command bin/hadoop jar wc.jar WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output in the Tutorial. The VM throwed the NullPoniterException at the line 47. In the line 45, the returned default value of conf.getBoolean is true. That is to say when wordcount.skip.patterns is not set ,the WordCount2 will continue to execute getCacheFiles.. Then patternsURIs gets the null value. When the -skip option dosen't exist, wordcount.skip.patterns will not be set. Then a NullPointerException come out. At all, the block after the if-statement in line no. 45 shoudn't be executed when the -skip option dosen't exist in command. Maybe the line 45 should like that if (conf.getBoolean(wordcount.skip.patterns, false)) { .Just change the boolean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5817: Labels: BB2015-05-TBR (was: ) mappers get rescheduled on node transition even after all reducers are completed Key: MAPREDUCE-5817 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Labels: BB2015-05-TBR Attachments: mapreduce-5817.patch We're seeing a behavior where a job runs long after all reducers were already finished. We found that the job was rescheduling and running a number of mappers beyond the point of reducer completion. In one situation, the job ran for some 9 more hours after all reducers completed! This happens because whenever a node transition (to an unusable state) comes into the app master, it just reschedules all mappers that already ran on the node in all cases. Therefore, if any node transition has a potential to extend the job period. Once this window opens, another node transition can prolong it, and this can happen indefinitely in theory. If there is some instability in the pool (unhealthy, etc.) for a duration, then any big job is severely vulnerable to this problem. If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule mapper tasks. If all reducers are completed, the mapper outputs are no longer needed, and there is no need to reschedule mapper tasks as they would not be consumed anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5490) MapReduce doesn't set the environment variable for children processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5490: Labels: BB2015-05-TBR (was: ) MapReduce doesn't set the environment variable for children processes - Key: MAPREDUCE-5490 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5490 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Owen O'Malley Assignee: Owen O'Malley Labels: BB2015-05-TBR Attachments: MAPREDUCE-5490.patch, mr-5490.patch, mr-5490.patch Currently, MapReduce uses the command line argument to pass the classpath to the child. This breaks if the process forks a child that needs the same classpath. Such a case happens in Hive when it uses map-side joins. I propose that we make MapReduce in branch-1 use the CLASSPATH environment variable like YARN does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5499) Fix synchronization issues of the setters/getters of *PBImpl which take in/return lists
[ https://issues.apache.org/jira/browse/MAPREDUCE-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5499: Labels: BB2015-05-TBR (was: ) Fix synchronization issues of the setters/getters of *PBImpl which take in/return lists --- Key: MAPREDUCE-5499 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5499 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen Assignee: Xuan Gong Labels: BB2015-05-TBR Attachments: MAPREDUCE-5499.1.patch, MAPREDUCE-5499.2.patch Similar to YARN-609. There're the following *PBImpls which need to be fixed: 1. GetDiagnosticsResponsePBImpl 2. GetTaskAttemptCompletionEventsResponsePBImpl 3. GetTaskReportsResposnePBImpl 4. CounterGroupPBImpl 5. JobReportPBImpl 6. TaskReportPBImpl -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5392) mapred job -history all command throws IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5392: Labels: BB2015-05-TBR (was: ) mapred job -history all command throws IndexOutOfBoundsException -- Key: MAPREDUCE-5392 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Labels: BB2015-05-TBR Attachments: MAPREDUCE-5392.2.patch, MAPREDUCE-5392.3.patch, MAPREDUCE-5392.4.patch, MAPREDUCE-5392.5.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch When I use an all option by mapred job -history comamnd, the following exceptions are displayed and do not work. {code} Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1875) at org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117) at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472) at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233) {code} This is because a node name recorded in History file is not given tracker_. Therefore it makes modifications to be able to read History file even if a node name is not given by tracker_. In addition, it fixes the URL of displayed task log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4065) Add .proto files to built tarball
[ https://issues.apache.org/jira/browse/MAPREDUCE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4065: Labels: BB2015-05-TBR (was: ) Add .proto files to built tarball - Key: MAPREDUCE-4065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4065 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.23.2, 2.4.0 Reporter: Ralph H Castain Assignee: Tsuyoshi Ozawa Labels: BB2015-05-TBR Attachments: MAPREDUCE-4065.1.patch Please add the .proto files to the built tarball so that users can build 3rd party tools that use protocol buffers without having to do an svn checkout of the source code. Sorry I don't know more about Maven, or I would provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6030) In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh
[ https://issues.apache.org/jira/browse/MAPREDUCE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6030: Labels: BB2015-05-TBR (was: ) In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh Key: MAPREDUCE-6030 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6030 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.4.1 Reporter: Youngjoon Kim Assignee: Youngjoon Kim Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6030.patch In mr-jobhistory-daemon.sh, some env variables are exported before sourcing mapred-env.sh, so these variables don't use values defined in mapred-env.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6040) distcp should automatically use /.reserved/raw when run by the superuser
[ https://issues.apache.org/jira/browse/MAPREDUCE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6040: Labels: BB2015-05-TBR (was: ) distcp should automatically use /.reserved/raw when run by the superuser Key: MAPREDUCE-6040 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6040 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Charles Lamb Labels: BB2015-05-TBR Attachments: HDFS-6134-Distcp-cp-UseCasesTable2.pdf, MAPREDUCE-6040.001.patch, MAPREDUCE-6040.002.patch On HDFS-6134, [~sanjay.radia] asked for distcp to automatically prepend /.reserved/raw if the distcp is being performed by the superuser and /.reserved/raw is supported by both the source and destination filesystems. This behavior only occurs if none of the src and target pathnames are /.reserved/raw. The -disablereservedraw flag can be used to disable this option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6304) Specifying node labels when submitting MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529508#comment-14529508 ] Naganarasimha G R commented on MAPREDUCE-6304: -- Thanks [~Wangda] for your comments, +1 for {{mention in description that, by default the node-label-expression for job is not set, it will use queue's default-node-label-expression.}}. I am getting it tested in cluster setup, will upload the updated patch today. Specifying node labels when submitting MR jobs -- Key: MAPREDUCE-6304 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6304 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jian Fang Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: MAPREDUCE-6304.20150410-1.patch, MAPREDUCE-6304.20150411-1.patch, MAPREDUCE-6304.20150501-1.patch Per the discussion on YARN-796, we need a mechanism in MAPREDUCE to specify node labels when submitting MR jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6068) Illegal progress value warnings in map tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6068: Labels: BB2015-05-TBR (was: ) Illegal progress value warnings in map tasks Key: MAPREDUCE-6068 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6068 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, task Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Binglin Chang Labels: BB2015-05-TBR Attachments: MAPREDUCE-6068.002.patch, MAPREDUCE-6068.v1.patch When running a terasort on latest trunk, I see the following in my task logs: {code} 2014-09-02 17:42:28,437 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal progress value found, progress is larger than 1. Progress will be changed to 1 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal progress value found, progress is larger than 1. Progress will be changed to 1 2014-09-02 17:42:42,241 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output {code} We should eliminate these warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6315: Labels: BB2015-05-TBR (was: ) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory Key: MAPREDUCE-6315 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mr-am Affects Versions: 2.7.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Critical Labels: BB2015-05-TBR Attachments: MAPREDUCE-6315.001.patch When all AM attempts crash, there is no record of them in JHS. Thus no easy way to get the logs. This JIRA automates the procedure by utilizing the jhist file in the staging directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6246: Labels: BB2015-05-TBR DB2 mapreduce (was: DB2 mapreduce) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2 - Key: MAPREDUCE-6246 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 2.4.1 Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x Platform: xSeries, pSeries Browser: Firefox, IE Security Settings: No Security, Flat file, LDAP, PAM File System: HDFS, GPFS FPO Reporter: ramtin Assignee: ramtin Labels: BB2015-05-TBR, DB2, mapreduce Attachments: MAPREDUCE-6246.002.patch, MAPREDUCE-6246.patch Original Estimate: 24h Remaining Estimate: 24h DBoutputformat is used for writing output of mapreduce jobs to the database and when used with db2 jdbc drivers it fails with following error com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) In DBOutputFormat class there is constructQuery method that generates INSERT INTO statement with semicolon(;) at the end. Semicolon is ANSI SQL-92 standard character for a statement terminator but this feature is disabled(OFF) as a default settings in IBM DB2. Although by using -t we can turn it ON for db2. (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). But there are some products that already built on top of this default setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6316) Task Attempt List entries should link to the task overview
[ https://issues.apache.org/jira/browse/MAPREDUCE-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6316: Labels: BB2015-05-TBR (was: ) Task Attempt List entries should link to the task overview -- Key: MAPREDUCE-6316 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6316 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: AM attempt page.png, AM task page.png, All Attempts page.png, MAPREDUCE-6316.v1.patch, MAPREDUCE-6316.v2.patch, MAPREDUCE-6316.v3.patch, Task Overview page.png Typical workflow is to click on the list of failed attempts. Then you want to look at the counters, or the list of attempts of just one task in general. If each entry task attempt id linked the task id portion of it back to the task, we would not have to go through the list of tasks to search for the task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5465: Labels: BB2015-05-TBR (was: ) Container killed before hprof dumps profile.out --- Key: MAPREDUCE-5465 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Radim Kolar Assignee: Ming Ma Labels: BB2015-05-TBR Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, MAPREDUCE-5465.patch If there is profiling enabled for mapper or reducer then hprof dumps profile.out at process exit. It is dumped after task signaled to AM that work is finished. AM kills container with finished work without waiting for hprof to finish dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 works) , it could not finish dump in time before being killed making entire dump unusable because cpu and heap stats are missing. There needs to be better delay before container is killed if profiling is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6305: Labels: BB2015-05-TBR (was: ) AM/Task log page should be able to link back to the job --- Key: MAPREDUCE-6305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Siqi Li Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: MAPREDUCE-6305.v1.patch, MAPREDUCE-6305.v2.patch, MAPREDUCE-6305.v3.patch, MAPREDUCE-6305.v4.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6241) Native compilation fails for Checksum.cc due to an incompatibility of assembler register constraint for PowerPC
[ https://issues.apache.org/jira/browse/MAPREDUCE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6241: Labels: BB2015-05-TBR features (was: features) Native compilation fails for Checksum.cc due to an incompatibility of assembler register constraint for PowerPC Key: MAPREDUCE-6241 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6241 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 3.0.0, 2.6.0 Environment: Debian/Jessie, kernel 3.18.5, ppc64 GNU/Linux gcc (Debian 4.9.1-19) protobuf 2.6.1 OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-2) OpenJDK Zero VM (build 24.65-b04, interpreted mode) source was cloned (and updated) from Apache-Hadoop's git repository Reporter: Stephan Drescher Assignee: Binglin Chang Priority: Minor Labels: BB2015-05-TBR, features Attachments: MAPREDUCE-6241.001.patch, MAPREDUCE-6241.002.patch Issue when using assembler code for performance optimization on the powerpc platform (compiled for 32bit) mvn compile -Pnative -DskipTests [exec] /usr/bin/c++ -Dnativetask_EXPORTS -m32 -DSIMPLE_MEMCPY -fno-strict-aliasing -Wall -Wno-sign-compare -g -O2 -DNDEBUG -fPIC -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native/javah -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native -I/home/hadoop/Java/java7/include -I/home/hadoop/Java/java7/include/linux -isystem /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/gtest/include -o CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o -c /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc [exec] CMakeFiles/nativetask.dir/build.make:744: recipe for target 'CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o' failed [exec] make[2]: Leaving directory '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native' [exec] CMakeFiles/Makefile2:95: recipe for target 'CMakeFiles/nativetask.dir/all' failed [exec] make[1]: Leaving directory '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native' [exec] Makefile:76: recipe for target 'all' failed [exec] /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc: In function ‘void NativeTask::init_cpu_support_flag()’: /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc:611:14: error: impossible register constraint in ‘asm’ -- popl %%ebx : =a (eax), [ebx] =r(ebx), =c(ecx), =d(edx) : a (eax_in) : cc); -- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6336: Labels: BB2015-05-TBR (was: ) Enable v2 FileOutputCommitter by default Key: MAPREDUCE-6336 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.7.0 Reporter: Gera Shegalov Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: MAPREDUCE-6336.v1.patch This JIRA is to propose making new FileOutputCommitter behavior from MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6269) improve JobConf to add option to not share Credentials between jobs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6269: Labels: BB2015-05-TBR (was: ) improve JobConf to add option to not share Credentials between jobs. Key: MAPREDUCE-6269 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6269 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: zhihai xu Assignee: zhihai xu Labels: BB2015-05-TBR Attachments: MAPREDUCE-6269.000.patch Improve JobConf to add constructor to avoid sharing Credentials between jobs. By default the Credentials will be shared to keep the backward compatibility. We can add a new constructor with a new parameter to decide whether to share Credentials. Some issues reported in cascading is due to corrupted credentials at https://github.com/Cascading/cascading/commit/45b33bb864172486ac43782a4d13329312d01c0e If we add this support in JobConf, it will benefit all job clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6298) Job#toString throws an exception when not in state RUNNING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6298: Labels: BB2015-05-TBR (was: ) Job#toString throws an exception when not in state RUNNING -- Key: MAPREDUCE-6298 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6298 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6298.1.patch Job#toString calls {{ensureState(JobState.RUNNING);}} as the very first thing. That method causes an Exception to be thrown which is not nice. One thing this breaks is usage of Job on the Scala (e.g. Spark) REPL as that calls toString after every invocation and that fails every time. I'll attach a patch that checks state and if it's RUNNING prints the original message and if not prints something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6356) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/MAPREDUCE-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6356: Labels: BB2015-05-TBR (was: ) Misspelling of threshold in log4j.properties for tests -- Key: MAPREDUCE-6356 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6356 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6356.patch log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2094: Labels: BB2015-05-TBR (was: ) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. --- Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Labels: BB2015-05-TBR Attachments: MAPREDUCE-2094-2011-05-19.patch, MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, MAPREDUCE-2094-2015-05-05-2328.patch, MAPREDUCE-2094-FileInputFormat-docs-v2.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6279) AM should explicity exit JVM after all services have stopped
[ https://issues.apache.org/jira/browse/MAPREDUCE-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6279: Labels: BB2015-05-TBR (was: ) AM should explicity exit JVM after all services have stopped Key: MAPREDUCE-6279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6279 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Eric Payne Labels: BB2015-05-TBR Attachments: MAPREDUCE-6279.v1.txt, MAPREDUCE-6279.v2.txt, MAPREDUCE-6279.v3.patch, MAPREDUCE-6279.v4.patch Occasionally the MapReduce AM can get stuck trying to shut down. MAPREDUCE-6049 and MAPREDUCE-5888 were specific instances that have been fixed, but this can also occur with uber jobs if the task code inadvertently leaves non-daemon threads lingering. We should explicitly shutdown the JVM after the MapReduce AM has unregistered and all services have been stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6174) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6174: Labels: BB2015-05-TBR (was: ) Combine common stream code into parent class for InMemoryMapOutput and OnDiskMapOutput. --- Key: MAPREDUCE-6174 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6174 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.0.0, 2.6.0 Reporter: Eric Payne Assignee: Eric Payne Labels: BB2015-05-TBR Attachments: MAPREDUCE-6174.002.patch, MAPREDUCE-6174.003.patch, MAPREDUCE-6174.v1.txt Per MAPREDUCE-6166, both InMemoryMapOutput and OnDiskMapOutput will be doing similar things with regards to IFile streams. In order to make it explicit that InMemoryMapOutput and OnDiskMapOutput are different from 3rd-party implementations, this JIRA will make them subclass a common class (see https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14223368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14223368) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5747) Potential null pointer deference in HsTasksBlock#render()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5747: Labels: BB2015-05-TBR newbie patch (was: newbie patch) Potential null pointer deference in HsTasksBlock#render() - Key: MAPREDUCE-5747 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5747 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ted Yu Priority: Minor Labels: BB2015-05-TBR, newbie, patch Attachments: MAPREDUCE-5747-1.patch At line 140: {code} } else { ta = new TaskAttemptInfo(successful, type, false); {code} There is no check for type against null. TaskAttemptInfo ctor deferences type: {code} public TaskAttemptInfo(TaskAttempt ta, TaskType type, Boolean isRunning) { final TaskAttemptReport report = ta.getReport(); this.type = type.toString(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6337) add a mode to replay MR job history files to the timeline service
[ https://issues.apache.org/jira/browse/MAPREDUCE-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6337: Labels: BB2015-05-TBR (was: ) add a mode to replay MR job history files to the timeline service - Key: MAPREDUCE-6337 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6337 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Sangjin Lee Assignee: Sangjin Lee Labels: BB2015-05-TBR Attachments: MAPREDUCE-6337-YARN-2928.001.patch The subtask covers the work on top of YARN-3437 to add a mode to replay MR job history files to the timeline service storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6079) Renaming JobImpl#username to reporterUserName
[ https://issues.apache.org/jira/browse/MAPREDUCE-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6079: Labels: BB2015-05-TBR (was: ) Renaming JobImpl#username to reporterUserName - Key: MAPREDUCE-6079 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6079 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Labels: BB2015-05-TBR Attachments: MAPREDUCE-6079.1.patch On MAPREDUCE-6033, we found the bug because of confusing field names {{userName}} and {{username}}. We should change the names to distinguish them easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated MAPREDUCE-6251: --- Status: Patch Available (was: Open) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases --- Key: MAPREDUCE-6251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.6.0 Reporter: Craig Welch Assignee: Craig Welch Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch The JobClient is used to get job status information for running and completed jobs. Final state and history for a job is communicated from the application master to the job history server via a distributed file system - where the history is uploaded by the application master to the dfs and then scanned/loaded by the jobhistory server. While HDFS has strong consistency guarantees not all Hadoop DFS's do. When used in conjunction with a distributed file system which does not have this guarantee there will be cases where the history server may not see an uploaded file, resulting in the dreaded no such job and a null value for the RunningJob in the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6251) JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated MAPREDUCE-6251: --- Attachment: MAPREDUCE-6251.4.patch Updated with recommended move to MRJobConfig JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases --- Key: MAPREDUCE-6251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6251 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.6.0 Reporter: Craig Welch Assignee: Craig Welch Attachments: MAPREDUCE-6251.0.patch, MAPREDUCE-6251.1.patch, MAPREDUCE-6251.2.patch, MAPREDUCE-6251.3.patch, MAPREDUCE-6251.4.patch The JobClient is used to get job status information for running and completed jobs. Final state and history for a job is communicated from the application master to the job history server via a distributed file system - where the history is uploaded by the application master to the dfs and then scanned/loaded by the jobhistory server. While HDFS has strong consistency guarantees not all Hadoop DFS's do. When used in conjunction with a distributed file system which does not have this guarantee there will be cases where the history server may not see an uploaded file, resulting in the dreaded no such job and a null value for the RunningJob in the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6320) Configuration of retrieved Job via Cluster is not properly set-up
[ https://issues.apache.org/jira/browse/MAPREDUCE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6320: Labels: BB2015-05-TBR (was: ) Configuration of retrieved Job via Cluster is not properly set-up - Key: MAPREDUCE-6320 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6320 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jens Rabe Assignee: Jens Rabe Labels: BB2015-05-TBR Attachments: MAPREDUCE-6320.001.patch, MAPREDUCE-6320.002.patch, MAPREDUCE-6320.003.patch When getting a Job via the Cluster API, it is not correctly configured. To reproduce this: # Submit a MR job, and set some arbitrary parameter to its configuration {code:java} job.getConfiguration().set(foo, bar); job.setJobName(foo-bug-demo); {code} # Get the job in a client: {code:java} final Cluster c = new Cluster(conf); final JobStatus[] statuses = c.getAllJobStatuses(); final JobStatus s = ... // get the status for the job named foo-bug-demo final Job j = c.getJob(s.getJobId()); final Configuration conf = job.getConfiguration(); {code} # Get its foo entry {code:java} final String s = conf.get(foo); {code} # Expected: s is bar; But: s is null. The reason is that the job's configuration is stored on HDFS (the Configuration has a resource with a *hdfs://* URL) and in the *loadResource* it is changed to a path on the local file system (hdfs://host.domain:port/tmp/hadoop-yarn/... is changed to /tmp/hadoop-yarn/...), which does not exist, and thus the configuration is not populated. The bug happens in the *Cluster* class, where *JobConfs* are created from *status.getJobFile()*. A quick fix would be to copy this job file to a temporary file in the local file system and populate the JobConf from this file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6128) Automatic addition of bundled jars to distributed cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6128: Labels: BB2015-05-TBR (was: ) Automatic addition of bundled jars to distributed cache Key: MAPREDUCE-6128 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6128 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.5.1 Reporter: Gera Shegalov Assignee: Gera Shegalov Labels: BB2015-05-TBR Attachments: MAPREDUCE-6128.v01.patch, MAPREDUCE-6128.v02.patch, MAPREDUCE-6128.v03.patch, MAPREDUCE-6128.v04.patch, MAPREDUCE-6128.v05.patch, MAPREDUCE-6128.v06.patch, MAPREDUCE-6128.v07.patch, MAPREDUCE-6128.v08.patch On the client side, JDK adds Class-Path elements from the job jar manifest on the classpath. In theory there could be many bundled jars in many directories such that adding them manually via libjars or similar means to task classpaths is cumbersome. If this property is enabled, the same jars are added to the task classpaths automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar
[ https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4683: Labels: BB2015-05-TBR (was: ) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar Key: MAPREDUCE-4683 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Reporter: Arun C Murthy Assignee: Akira AJISAKA Priority: Critical Labels: BB2015-05-TBR Attachments: MAPREDUCE-4683.patch We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6310: Labels: BB2015-05-TBR (was: ) Add jdiff support to MapReduce -- Key: MAPREDUCE-6310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Blocker Labels: BB2015-05-TBR Attachments: MAPRED-6310-040615.patch Previously we used jdiff for Hadoop common and HDFS. Now we're extending the support of jdiff to YARN. Probably we'd like to do similar things with MapReduce? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6271) org.apache.hadoop.mapreduce.Cluster GetJob() display warn log
[ https://issues.apache.org/jira/browse/MAPREDUCE-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6271: Labels: BB2015-05-TBR (was: ) org.apache.hadoop.mapreduce.Cluster GetJob() display warn log - Key: MAPREDUCE-6271 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6271 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.7.0 Reporter: Peng Zhang Assignee: Peng Zhang Labels: BB2015-05-TBR Attachments: MAPREDUCE-6271.v2.patch, MR-6271.patch When using getJob() with MapReduce 2.7, warn log caused by configuration loaded twice is displayed every time. And when job completed, this function will display warn log of java.io.FileNotFoundException And I think this is related with MAPREDUCE-5875, the change in GetJob() seems to be not needed, cause it's only for test. {noformat} 15/03/04 13:41:23 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:23 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:24 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:24 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:25 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:25 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:26 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:26 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:27 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:27 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:28 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:28 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:29 WARN conf.Configuration: hdfsG://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:29 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 15/03/04 13:41:29 INFO exec.Task: 2015-03-04 13:41:29,853 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.37 sec 15/03/04 13:41:30 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/04 13:41:30 WARN conf.Configuration: hdfs://example/yarn/example2/staging/test_user/.staging/job_1425388652704_0116/job.xml:an attempt to override final parameter:
[jira] [Updated] (MAPREDUCE-6296) A better way to deal with InterruptedException on waitForCompletion
[ https://issues.apache.org/jira/browse/MAPREDUCE-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6296: Labels: BB2015-05-TBR (was: ) A better way to deal with InterruptedException on waitForCompletion --- Key: MAPREDUCE-6296 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6296 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Yang Hao Assignee: Yang Hao Labels: BB2015-05-TBR Attachments: MAPREDUCE-6296.patch Some code in method waitForCompletion of Job class is {code:title=Job.java|borderStyle=solid} public boolean waitForCompletion(boolean verbose ) throws IOException, InterruptedException, ClassNotFoundException { if (state == JobState.DEFINE) { submit(); } if (verbose) { monitorAndPrintJob(); } else { // get the completion poll interval from the client. int completionPollIntervalMillis = Job.getCompletionPollInterval(cluster.getConf()); while (!isComplete()) { try { Thread.sleep(completionPollIntervalMillis); } catch (InterruptedException ie) { } } } return isSuccessful(); } {code} but a better way to deal with InterruptException is {code:title=Job.java|borderStyle=solid} public boolean waitForCompletion(boolean verbose ) throws IOException, InterruptedException, ClassNotFoundException { if (state == JobState.DEFINE) { submit(); } if (verbose) { monitorAndPrintJob(); } else { // get the completion poll interval from the client. int completionPollIntervalMillis = Job.getCompletionPollInterval(cluster.getConf()); while (!isComplete()) { try { Thread.sleep(completionPollIntervalMillis); } catch (InterruptedException ie) { Thread.currentThread().interrupt(); } } } return isSuccessful(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3517) map.input.path is null at the first split when use CombieFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3517: Labels: BB2015-05-TBR (was: ) map.input.path is null at the first split when use CombieFileInputFormat --- Key: MAPREDUCE-3517 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3517 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.20.203.0 Reporter: wanbin Labels: BB2015-05-TBR Attachments: CombineFileRecordReader.diff, MAPREDUCE-3517.02.patch map.input.path is null at the first split when use CombieFileInputFormat. because in runNewMapper function, mapContext instead of taskContext which is set map.input.path. so we need set map.input.path again to mapContext -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5883) Total megabyte-seconds in job counters is slightly misleading
[ https://issues.apache.org/jira/browse/MAPREDUCE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5883: Labels: BB2015-05-TBR (was: ) Total megabyte-seconds in job counters is slightly misleading --- Key: MAPREDUCE-5883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5883 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5883.patch The following counters are in milliseconds so megabyte-seconds might be better stated as megabyte-milliseconds MB_MILLIS_MAPS.name= Total megabyte-seconds taken by all map tasks MB_MILLIS_REDUCES.name=Total megabyte-seconds taken by all reduce tasks VCORES_MILLIS_MAPS.name= Total vcore-seconds taken by all map tasks VCORES_MILLIS_REDUCES.name=Total vcore-seconds taken by all reduce tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6027) mr jobs with relative paths can fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6027: Labels: BB2015-05-TBR (was: ) mr jobs with relative paths can fail Key: MAPREDUCE-6027 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6027 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission Reporter: Wing Yew Poon Assignee: Wing Yew Poon Labels: BB2015-05-TBR Attachments: MAPREDUCE-6027.patch I built hadoop from branch-2 and tried to run terasort as follows: {noformat} wypoon$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-SNAPSHOT.jar terasort sort-input sort-output 14/08/07 08:57:55 INFO terasort.TeraSort: starting 2014-08-07 08:57:56.229 java[36572:1903] Unable to load realm info from SCDynamicStore 14/08/07 08:57:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/08/07 08:57:57 INFO input.FileInputFormat: Total input paths to process : 2 Spent 156ms computing base-splits. Spent 2ms computing TeraScheduler splits. Computing input splits took 159ms Sampling 2 splits of 2 Making 1 from 10 sampled records Computing parititions took 626ms Spent 789ms computing partitions. 14/08/07 08:57:57 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032 14/08/07 08:57:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/wypoon/.staging/job_1407426900134_0001 java.lang.IllegalArgumentException: Can not create a Path from an empty URI at org.apache.hadoop.fs.Path.checkPathArg(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:192) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.checkPermissionOfOther(ClientDistributedCacheManager.java:275) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.ancestorsHaveExecutePermissions(ClientDistributedCacheManager.java:256) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.isPublic(ClientDistributedCacheManager.java:243) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineCacheVisibilities(ClientDistributedCacheManager.java:162) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:58) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:316) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} If I used absolute paths for the input and out directories, the job runs fine. This breakage is due to HADOOP-10876. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5876) SequenceFileRecordReader NPE if close() is called before initialize()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5876: Labels: BB2015-05-TBR (was: ) SequenceFileRecordReader NPE if close() is called before initialize() - Key: MAPREDUCE-5876 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5876 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.3.0, 2.4.0 Reporter: Reinis Vicups Assignee: Tsuyoshi Ozawa Labels: BB2015-05-TBR Attachments: MAPREDUCE-5876.1.patch org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader extends org.apache.hadoop.mapreduce.RecordReader which in turn implements java.io.Closeable. According to java spec the java.io.Closeable#close() has to be idempotent (http://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html) which is not. An NPE is being thrown if close() method is invoked without previously calling initialize() method. This happens because SequenceFile.Reader in is null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6003) Resource Estimator suggests huge map output in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6003: Labels: BB2015-05-TBR (was: ) Resource Estimator suggests huge map output in some cases - Key: MAPREDUCE-6003 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6003 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.2.1 Reporter: Chengbing Liu Assignee: Chengbing Liu Labels: BB2015-05-TBR Attachments: MAPREDUCE-6003-branch-1.2.patch In some cases, ResourceEstimator can return way too large map output estimation. This happens when input size is not correctly calculated. A typical case is when joining two Hive tables (one in HDFS and the other in HBase). The maps that process the HBase table finish first, which has a 0 length of inputs due to its TableInputFormat. Then for a map that processes HDFS table, the estimated output size is very large because of the wrong input size, causing the map task not possible to be assigned. There are two possible solutions to this problem: (1) Make input size correct for each case, e.g. HBase, etc. (2) Use another algorithm to estimate the map output, or at least make it closer to reality. I prefer the second way, since the first would require all possibilities to be taken care of. It is not easy for some inputs such as URIs. In my opinion, we could make a second estimation which is independent of the input size: estimationB = (completedMapOutputSize / completedMaps) * totalMaps * 10 Here, multiplying by 10 makes the estimation more conservative, so that it will be less likely to assign it to some where not big enough. The former estimation goes like this: estimationA = (inputSize * completedMapOutputSize * 2.0) / completedMapInputSize My suggestion is to take minimum of the two estimations: estimation = min(estimationA, estimationB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3182) loadgen ignores -m command line when writing random data
[ https://issues.apache.org/jira/browse/MAPREDUCE-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3182: Labels: BB2015-05-TBR (was: ) loadgen ignores -m command line when writing random data Key: MAPREDUCE-3182 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3182 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 0.23.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Chen He Labels: BB2015-05-TBR Attachments: MAPREDUCE-3182.patch If no input directories are specified, loadgen goes into a special mode where random data is generated and written. In that mode, setting the number of mappers (-m command line option) is overridden by a calculation. Instead, it should take into consideration the user specified number of mappers and fall back to the calculation. In addition, update the documentation as well to match the new behavior in the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-1380: Labels: BB2015-05-TBR (was: ) Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Jordà Polo Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-1380-branch-1.2.patch, MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5845) TestShuffleHandler failing intermittently on windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5845: Labels: BB2015-05-TBR (was: ) TestShuffleHandler failing intermittently on windows Key: MAPREDUCE-5845 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5845 Project: Hadoop Map/Reduce Issue Type: Test Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: BB2015-05-TBR Attachments: apache-mapreduce-5845.0.patch TestShuffleHandler fails intermittently on Windows - specifically, testClientClosesConnection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5225) SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5225: Labels: BB2015-05-TBR (was: ) SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits --- Key: MAPREDUCE-5225 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5225 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: BB2015-05-TBR Attachments: MAPREDUCE-5225.1.patch Now, SplitSampler only samples the first maxSplitsSampled splits, caused by MAPREDUCE-1820. However, jumping around all splits is in general preferable than the first N splits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats
[ https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4216: Labels: BB2015-05-TBR Output (was: Output) Make MultipleOutputs generic to support non-file output formats --- Key: MAPREDUCE-4216 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 1.0.2 Reporter: Robbie Strickland Labels: BB2015-05-TBR, Output Attachments: MAPREDUCE-4216.patch The current MultipleOutputs implementation is tied to FileOutputFormat in such a way that it is not extensible to other types of output. It should be made more generic, such as with an interface that can be implemented for different outputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records
[ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4840: Labels: BB2015-05-TBR (was: ) Delete dead code and deprecate public API related to skipping bad records - Key: MAPREDUCE-4840 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Mostafa Elhemali Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-4840.patch It looks like the decision was made in MAPREDUCE-1932 to remove support for skipping bad records rather than fix it (it doesn't work right now in trunk). If that's the case then we should probably delete all the dead code related to it and deprecate the public API's for it right? Dead code I'm talking about: 1. Task class: skipping, skipRanges, writeSkipRecs 2. MapTask class: SkippingRecordReader inner class 3. ReduceTask class: SkippingReduceValuesIterator inner class 4. Tests: TestBadRecords Public API: 1. SkipBadRecords class -- This message was sent by Atlassian JIRA (v6.3.4#6332)