[jira] [Commented] (MAPREDUCE-6091) YARNRunner.getJobStatus() fails with ApplicationNotFoundException if the job rolled off the RM view
[ https://issues.apache.org/jira/browse/MAPREDUCE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135027#comment-14135027 ] Hadoop QA commented on MAPREDUCE-6091: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668967/MAPREDUCE-6091.patch against trunk revision 7e08c0f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4881//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4881//console This message is automatically generated. YARNRunner.getJobStatus() fails with ApplicationNotFoundException if the job rolled off the RM view --- Key: MAPREDUCE-6091 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6091 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.1.0-beta Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: MAPREDUCE-6091.patch If you query the job status of a job that rolled off the RM view via YARNRunner.getJobStatus(), it fails with an ApplicationNotFoundException. For example, {noformat} 2014-09-15 07:09:51,084 ERROR org.apache.pig.tools.grunt.Grunt: ERROR 6017: JobID: job_1410289045532_90542 Reason: java.io.IOException: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1410289045532_90542' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:288) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:150) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:337) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2058) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2054) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:348) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:559) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.checkRunningState(ControlledJob.java:257) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.checkState(ControlledJob.java:282) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[jira] [Updated] (MAPREDUCE-5831) Old MR client is not compatible with new MR application
[ https://issues.apache.org/jira/browse/MAPREDUCE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated MAPREDUCE-5831: -- Assignee: Junping Du (was: Tan, Wangda) Old MR client is not compatible with new MR application --- Key: MAPREDUCE-5831 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5831 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mr-am Affects Versions: 2.2.0, 2.3.0 Reporter: Zhijie Shen Assignee: Junping Du Priority: Critical Recently, we saw the following scenario: 1. The user setup a cluster of hadoop 2.3., which contains YARN 2.3 and MR 2.3. 2. The user client on a machine that MR 2.2 is installed and in the classpath. Then, when the user submitted a simple wordcount job, he saw the following message: {code} 16:00:41,027 INFO main mapreduce.Job:1345 - map 100% reduce 100% 16:00:41,036 INFO main mapreduce.Job:1356 - Job job_1396468045458_0006 completed successfully 16:02:20,535 WARN main mapreduce.JobRunner:212 - Cannot start job [wordcountJob] java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES at java.lang.Enum.valueOf(Enum.java:236) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182) at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370) at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289) . . . {code} The problem is that the wordcount job was running on one or more than one nodes of the YARN cluster, where MR 2.3 libs were installed, and JobCounter.MB_MILLIS_REDUCES is available in the counters. On the other side, due to the classpath setting, the client was likely to run with MR 2.2 libs. After the client retrieved the counters from MR AM, it tried to construct the Counter object with the received counter name. Unfortunately, the enum didn't exist in the client's classpath. Therefore, No enum constant exception is thrown here. JobCounter.MB_MILLIS_REDUCES is brought to MR2 via MAPREDUCE-5464 since Hadoop 2.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts
Akira AJISAKA created MAPREDUCE-6092: Summary: TestJobHistoryParsing#testPartialJob timeouts Key: MAPREDUCE-6092 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6092 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Akira AJISAKA TestJobHistoryParsing#testPartialJob timeouts. {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts
[ https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned MAPREDUCE-6092: Assignee: Akira AJISAKA TestJobHistoryParsing#testPartialJob timeouts - Key: MAPREDUCE-6092 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6092 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Akira AJISAKA Assignee: Akira AJISAKA TestJobHistoryParsing#testPartialJob timeouts. {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts
[ https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-6092: - Attachment: MAPREDUCE-6092.patch Attaching a patch to extend the timeout from 1 second to 10 seconds. In my environments, the test took * 1.273 sec on Xeon E5-2430L @ 2.00 GHz (CentOS 6.3) * 3.097 sec on Core i3 @ 1.3 GHz (Mac OS X 10.9) TestJobHistoryParsing#testPartialJob timeouts - Key: MAPREDUCE-6092 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6092 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: MAPREDUCE-6092.patch TestJobHistoryParsing#testPartialJob timeouts. {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts
[ https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-6092: - Description: TestJobHistoryParsing#testPartialJob timeouts in my environments. We should extend the timeout {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829) {code} was: TestJobHistoryParsing#testPartialJob timeouts. {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at
[jira] [Updated] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts in some environments
[ https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-6092: - Description: Rebasing the patch in MAPREDUCE-5392, I found TestJobHistoryParsing#testPartialJob timeout in my environments. {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829) {code} We should extend the timeout not to fail the test in slow machines. was: TestJobHistoryParsing#testPartialJob timeouts in my environments. We should extend the timeout {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at
[jira] [Updated] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts in some environments
[ https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-6092: - Labels: newbie (was: ) Target Version/s: 2.6.0 Status: Patch Available (was: Open) TestJobHistoryParsing#testPartialJob timeouts in some environments -- Key: MAPREDUCE-6092 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6092 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: MAPREDUCE-6092.patch Rebasing the patch in MAPREDUCE-5392, I found TestJobHistoryParsing#testPartialJob timeout in my environments. {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829) {code} We should extend the timeout not to fail the test in slow machines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5279) Jobs can deadlock if headroom is limited by cpu instead of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135167#comment-14135167 ] Peng Zhang commented on MAPREDUCE-5279: --- Tanks [~vvasudev] for taking this. And after initial patch, we found one bug and fixed it in our branch. Code context is below: {code} private ListContainer getResources() throws Exception { -int headRoom = getAvailableResources() != null -? getAvailableResources().getMemory() : 0;//first time it would be null +Resource headRoom = +getAvailableResources() == null ? Resources.none() +: getAvailableResources(); // will be null the first time AllocateResponse response; {code} {code} -int newHeadRoom = getAvailableResources() != null ? getAvailableResources().getMemory() : 0; +Resource newHeadRoom = +getAvailableResources() == null ? Resources.none() +: getAvailableResources(); ListContainer newContainers = response.getAllocatedContainers(); // Setting NMTokens {code} {code} -if (newContainers.size() + finishedContainers.size() 0 || headRoom != newHeadRoom) { +if (newContainers.size() + finishedContainers.size() 0 +|| !headRoom.equals(newHeadRoom)) { //something changed recalculateReduceSchedule = true; {code} headRoom and newHeadRoom reference to the same object, so equals check always return true except first time. I fixed this by adding Resources.clone(getAvailableResources()) before calling makeRemoteRequest (); Jobs can deadlock if headroom is limited by cpu instead of memory - Key: MAPREDUCE-5279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, scheduler Affects Versions: 2.0.3-alpha Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Attachments: MAPREDUCE-5279-v2.patch, MAPREDUCE-5279.patch, apache-mapreduce-5279.3.patch, apache-mapreduce-5279.4.patch YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't take into account virtual cores while scheduling reduce tasks. This may cause more reduce tasks to be scheduled because memory is enough. And on a small cluster, this will end with deadlock, all running containers are reduce tasks but map phase is not finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6092) TestJobHistoryParsing#testPartialJob timeouts in some environments
[ https://issues.apache.org/jira/browse/MAPREDUCE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135194#comment-14135194 ] Hadoop QA commented on MAPREDUCE-6092: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669004/MAPREDUCE-6092.patch against trunk revision 7e08c0f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4883//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4883//console This message is automatically generated. TestJobHistoryParsing#testPartialJob timeouts in some environments -- Key: MAPREDUCE-6092 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6092 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: MAPREDUCE-6092.patch Rebasing the patch in MAPREDUCE-5392, I found TestJobHistoryParsing#testPartialJob timeout in my environments. {code} Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 15, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 106.007 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing testPartialJob(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 0.987 sec ERROR! java.lang.Exception: test timed out after 1000 milliseconds at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:868) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:887) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1288) at org.apache.hadoop.security.SecurityUtil.clinit(SecurityUtil.java:70) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:235) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:761) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:746) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:619) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testPartialJob(TestJobHistoryParsing.java:829) {code} We should extend the timeout not to fail the test in slow machines. -- This message was sent by
[jira] [Commented] (MAPREDUCE-6090) mapred hsadmin getGroups fails to connect in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135542#comment-14135542 ] Jason Lowe commented on MAPREDUCE-6090: --- Thanks for the report and patch, Robert! Rather than an explicit static block that replicates the logic in JobConf, it'd be cleaner if HSAdmin simply reused that logic by expecting a JobConf rather than a Configuration. The constructor that takes a conf argument should expect a JobConf, and the main() method could construct the HSAdmin object with a JobConf rather than relying on ToolRunner to pass the conf. mapred hsadmin getGroups fails to connect in some cases --- Key: MAPREDUCE-6090 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6090 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.5.1 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: MAPREDUCE-6090.patch If you do {{mapred hsadmin -getGroups}} it works fine (assuming {{mapreduce.jobhistory.admin.address}} is set properly in mapred-site.xml). But if you do {{mapred hsadmin -getGroups foo_user}}, it will keep retrying to connect to localhost: {noformat} INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5279) Jobs can deadlock if headroom is limited by cpu instead of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated MAPREDUCE-5279: - Status: Open (was: Patch Available) Jobs can deadlock if headroom is limited by cpu instead of memory - Key: MAPREDUCE-5279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, scheduler Affects Versions: 2.0.3-alpha Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Attachments: MAPREDUCE-5279-v2.patch, MAPREDUCE-5279.patch, apache-mapreduce-5279.3.patch, apache-mapreduce-5279.4.patch YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't take into account virtual cores while scheduling reduce tasks. This may cause more reduce tasks to be scheduled because memory is enough. And on a small cluster, this will end with deadlock, all running containers are reduce tasks but map phase is not finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5279) Jobs can deadlock if headroom is limited by cpu instead of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated MAPREDUCE-5279: - Attachment: apache-mapreduce-5279.5.patch Thanks for pointing out the bug [~peng.zhang]! I've uploaded a new patch fixing it. Can you please confirm that the fix is ok? Jobs can deadlock if headroom is limited by cpu instead of memory - Key: MAPREDUCE-5279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, scheduler Affects Versions: 2.0.3-alpha Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Attachments: MAPREDUCE-5279-v2.patch, MAPREDUCE-5279.patch, apache-mapreduce-5279.3.patch, apache-mapreduce-5279.4.patch, apache-mapreduce-5279.5.patch YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't take into account virtual cores while scheduling reduce tasks. This may cause more reduce tasks to be scheduled because memory is enough. And on a small cluster, this will end with deadlock, all running containers are reduce tasks but map phase is not finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5279) Jobs can deadlock if headroom is limited by cpu instead of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated MAPREDUCE-5279: - Status: Patch Available (was: Open) Jobs can deadlock if headroom is limited by cpu instead of memory - Key: MAPREDUCE-5279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, scheduler Affects Versions: 2.0.3-alpha Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Attachments: MAPREDUCE-5279-v2.patch, MAPREDUCE-5279.patch, apache-mapreduce-5279.3.patch, apache-mapreduce-5279.4.patch, apache-mapreduce-5279.5.patch YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't take into account virtual cores while scheduling reduce tasks. This may cause more reduce tasks to be scheduled because memory is enough. And on a small cluster, this will end with deadlock, all running containers are reduce tasks but map phase is not finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5279) Jobs can deadlock if headroom is limited by cpu instead of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135605#comment-14135605 ] Hadoop QA commented on MAPREDUCE-5279: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669073/apache-mapreduce-5279.5.patch against trunk revision 0c26412. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4884//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4884//console This message is automatically generated. Jobs can deadlock if headroom is limited by cpu instead of memory - Key: MAPREDUCE-5279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, scheduler Affects Versions: 2.0.3-alpha Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Attachments: MAPREDUCE-5279-v2.patch, MAPREDUCE-5279.patch, apache-mapreduce-5279.3.patch, apache-mapreduce-5279.4.patch, apache-mapreduce-5279.5.patch YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't take into account virtual cores while scheduling reduce tasks. This may cause more reduce tasks to be scheduled because memory is enough. And on a small cluster, this will end with deadlock, all running containers are reduce tasks but map phase is not finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6091) YARNRunner.getJobStatus() fails with ApplicationNotFoundException if the job rolled off the RM view
[ https://issues.apache.org/jira/browse/MAPREDUCE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated MAPREDUCE-6091: --- Attachment: MAPREDUCE-6091.patch Fixed TestNonExistentJob (restored the old test). The other failures appear unrelated to this patch (it's reproducible on the trunk). YARNRunner.getJobStatus() fails with ApplicationNotFoundException if the job rolled off the RM view --- Key: MAPREDUCE-6091 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6091 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.1.0-beta Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: MAPREDUCE-6091.patch, MAPREDUCE-6091.patch If you query the job status of a job that rolled off the RM view via YARNRunner.getJobStatus(), it fails with an ApplicationNotFoundException. For example, {noformat} 2014-09-15 07:09:51,084 ERROR org.apache.pig.tools.grunt.Grunt: ERROR 6017: JobID: job_1410289045532_90542 Reason: java.io.IOException: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1410289045532_90542' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:288) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:150) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:337) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2058) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2054) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:348) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:559) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.checkRunningState(ControlledJob.java:257) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.checkState(ControlledJob.java:282) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.pig.backend.hadoop23.PigJobControl.checkState(PigJobControl.java:120) at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:180) at java.lang.Thread.run(Thread.java:662) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:279) {noformat} Prior to 2.1.0, it used to be able to fall back onto the job history server and get the status. This appears to be introduced by YARN-873. YARN-873 changed ClientRMService to throw an ApplicationNotFoundException on an unknown app id (from returning null). But MR's ClientServiceDelegate was never modified to change its behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6091) YARNRunner.getJobStatus() fails with ApplicationNotFoundException if the job rolled off the RM view
[ https://issues.apache.org/jira/browse/MAPREDUCE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135672#comment-14135672 ] Sangjin Lee commented on MAPREDUCE-6091: This patch pretty much restores the old behavior of ClientServiceDelegate. If the job does not exist either in the RM or in the job history server, ClientServiceDelegate.getJobStatus() would return null. The job history client returns null if the job does not exist in the job history server. YARNRunner.getJobStatus() fails with ApplicationNotFoundException if the job rolled off the RM view --- Key: MAPREDUCE-6091 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6091 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.1.0-beta Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: MAPREDUCE-6091.patch, MAPREDUCE-6091.patch If you query the job status of a job that rolled off the RM view via YARNRunner.getJobStatus(), it fails with an ApplicationNotFoundException. For example, {noformat} 2014-09-15 07:09:51,084 ERROR org.apache.pig.tools.grunt.Grunt: ERROR 6017: JobID: job_1410289045532_90542 Reason: java.io.IOException: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1410289045532_90542' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:288) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:150) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:337) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2058) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2054) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:348) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:559) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.checkRunningState(ControlledJob.java:257) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.checkState(ControlledJob.java:282) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.pig.backend.hadoop23.PigJobControl.checkState(PigJobControl.java:120) at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:180) at java.lang.Thread.run(Thread.java:662) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:279) {noformat} Prior to 2.1.0, it used to be able to fall back onto the job history server and get the status. This appears to be introduced by YARN-873. YARN-873 changed ClientRMService to throw an ApplicationNotFoundException on an unknown app id (from returning null). But MR's ClientServiceDelegate was never modified to change its behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6093) minor distcp doc edits
Charles Lamb created MAPREDUCE-6093: --- Summary: minor distcp doc edits Key: MAPREDUCE-6093 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6093 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp, documentation Affects Versions: 3.0.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Minor edits to DistCp.md.vm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6093) minor distcp doc edits
[ https://issues.apache.org/jira/browse/MAPREDUCE-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated MAPREDUCE-6093: Attachment: MAPREDUCE-6093.001.patch Minor edit to the DistCp.md.vm doc. No tests are required since it's just a doc change. minor distcp doc edits -- Key: MAPREDUCE-6093 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6093 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp, documentation Affects Versions: 3.0.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Attachments: MAPREDUCE-6093.001.patch Minor edits to DistCp.md.vm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (MAPREDUCE-6093) minor distcp doc edits
[ https://issues.apache.org/jira/browse/MAPREDUCE-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-6093 started by Charles Lamb. --- minor distcp doc edits -- Key: MAPREDUCE-6093 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6093 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp, documentation Affects Versions: 3.0.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Attachments: MAPREDUCE-6093.001.patch Minor edits to DistCp.md.vm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6040) distcp should automatically use /.reserved/raw when run by the superuser
[ https://issues.apache.org/jira/browse/MAPREDUCE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated MAPREDUCE-6040: Attachment: MAPREDUCE-6040.002.patch The .002 patch is the same as .001 except that it removes one unrelated doc edit. The doc edit has been moved to MAPREDUCE-6093. distcp should automatically use /.reserved/raw when run by the superuser Key: MAPREDUCE-6040 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6040 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Charles Lamb Attachments: HDFS-6134-Distcp-cp-UseCasesTable2.pdf, MAPREDUCE-6040.001.patch, MAPREDUCE-6040.002.patch On HDFS-6134, [~sanjay.radia] asked for distcp to automatically prepend /.reserved/raw if the distcp is being performed by the superuser and /.reserved/raw is supported by both the source and destination filesystems. This behavior only occurs if none of the src and target pathnames are /.reserved/raw. The -disablereservedraw flag can be used to disable this option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6040) distcp should automatically use /.reserved/raw when run by the superuser
[ https://issues.apache.org/jira/browse/MAPREDUCE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135783#comment-14135783 ] Hadoop QA commented on MAPREDUCE-6040: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669095/MAPREDUCE-6040.002.patch against trunk revision 0c26412. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-tools/hadoop-distcp. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4886//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4886//console This message is automatically generated. distcp should automatically use /.reserved/raw when run by the superuser Key: MAPREDUCE-6040 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6040 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Charles Lamb Attachments: HDFS-6134-Distcp-cp-UseCasesTable2.pdf, MAPREDUCE-6040.001.patch, MAPREDUCE-6040.002.patch On HDFS-6134, [~sanjay.radia] asked for distcp to automatically prepend /.reserved/raw if the distcp is being performed by the superuser and /.reserved/raw is supported by both the source and destination filesystems. This behavior only occurs if none of the src and target pathnames are /.reserved/raw. The -disablereservedraw flag can be used to disable this option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6091) YARNRunner.getJobStatus() fails with ApplicationNotFoundException if the job rolled off the RM view
[ https://issues.apache.org/jira/browse/MAPREDUCE-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135885#comment-14135885 ] Hadoop QA commented on MAPREDUCE-6091: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669087/MAPREDUCE-6091.patch against trunk revision 0c26412. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4885//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4885//console This message is automatically generated. YARNRunner.getJobStatus() fails with ApplicationNotFoundException if the job rolled off the RM view --- Key: MAPREDUCE-6091 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6091 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.1.0-beta Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: MAPREDUCE-6091.patch, MAPREDUCE-6091.patch If you query the job status of a job that rolled off the RM view via YARNRunner.getJobStatus(), it fails with an ApplicationNotFoundException. For example, {noformat} 2014-09-15 07:09:51,084 ERROR org.apache.pig.tools.grunt.Grunt: ERROR 6017: JobID: job_1410289045532_90542 Reason: java.io.IOException: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1410289045532_90542' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:288) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:150) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:337) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2058) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2054) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:348) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:559) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1547) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.checkRunningState(ControlledJob.java:257) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.checkState(ControlledJob.java:282) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
[jira] [Updated] (MAPREDUCE-6090) mapred hsadmin getGroups fails to connect in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6090: - Attachment: MAPREDUCE-6090.patch Thanks for taking a look Jason; that makes sense. The new patch uses a JobConf. mapred hsadmin getGroups fails to connect in some cases --- Key: MAPREDUCE-6090 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6090 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.5.1 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: MAPREDUCE-6090.patch, MAPREDUCE-6090.patch If you do {{mapred hsadmin -getGroups}} it works fine (assuming {{mapreduce.jobhistory.admin.address}} is set properly in mapred-site.xml). But if you do {{mapred hsadmin -getGroups foo_user}}, it will keep retrying to connect to localhost: {noformat} INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6087) MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-6087: --- Status: Open (was: Patch Available) We cannot simply rename the property unfortunately. The config was added by MAPREDUCE-5616 which went into 2.3. We will have to support both the old name and the new name and show deprecation warnings on the old name. MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong Key: MAPREDUCE-6087 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6087 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jian He Assignee: Akira AJISAKA Labels: newbie Attachments: MAPREDUCE-6087.patch The config name for MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS now has double prefix as yarn.app.mapreduce. + yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6090) mapred hsadmin getGroups fails to connect in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136090#comment-14136090 ] Hadoop QA commented on MAPREDUCE-6090: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669152/MAPREDUCE-6090.patch against trunk revision 8e5d671. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4887//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4887//console This message is automatically generated. mapred hsadmin getGroups fails to connect in some cases --- Key: MAPREDUCE-6090 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6090 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.5.1 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: MAPREDUCE-6090.patch, MAPREDUCE-6090.patch If you do {{mapred hsadmin -getGroups}} it works fine (assuming {{mapreduce.jobhistory.admin.address}} is set properly in mapred-site.xml). But if you do {{mapred hsadmin -getGroups foo_user}}, it will keep retrying to connect to localhost: {noformat} INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5831) Old MR client is not compatible with new MR application
[ https://issues.apache.org/jira/browse/MAPREDUCE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136315#comment-14136315 ] Junping Du commented on MAPREDUCE-5831: --- Taking it over and mark this as blocker as we will support rolling upgrade since 2.6. Old MR client is not compatible with new MR application --- Key: MAPREDUCE-5831 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5831 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mr-am Affects Versions: 2.2.0, 2.3.0 Reporter: Zhijie Shen Assignee: Junping Du Priority: Blocker Recently, we saw the following scenario: 1. The user setup a cluster of hadoop 2.3., which contains YARN 2.3 and MR 2.3. 2. The user client on a machine that MR 2.2 is installed and in the classpath. Then, when the user submitted a simple wordcount job, he saw the following message: {code} 16:00:41,027 INFO main mapreduce.Job:1345 - map 100% reduce 100% 16:00:41,036 INFO main mapreduce.Job:1356 - Job job_1396468045458_0006 completed successfully 16:02:20,535 WARN main mapreduce.JobRunner:212 - Cannot start job [wordcountJob] java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES at java.lang.Enum.valueOf(Enum.java:236) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182) at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370) at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289) . . . {code} The problem is that the wordcount job was running on one or more than one nodes of the YARN cluster, where MR 2.3 libs were installed, and JobCounter.MB_MILLIS_REDUCES is available in the counters. On the other side, due to the classpath setting, the client was likely to run with MR 2.2 libs. After the client retrieved the counters from MR AM, it tried to construct the Counter object with the received counter name. Unfortunately, the enum didn't exist in the client's classpath. Therefore, No enum constant exception is thrown here. JobCounter.MB_MILLIS_REDUCES is brought to MR2 via MAPREDUCE-5464 since Hadoop 2.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5831) Old MR client is not compatible with new MR application
[ https://issues.apache.org/jira/browse/MAPREDUCE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5831: -- Priority: Blocker (was: Critical) Old MR client is not compatible with new MR application --- Key: MAPREDUCE-5831 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5831 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mr-am Affects Versions: 2.2.0, 2.3.0 Reporter: Zhijie Shen Assignee: Junping Du Priority: Blocker Recently, we saw the following scenario: 1. The user setup a cluster of hadoop 2.3., which contains YARN 2.3 and MR 2.3. 2. The user client on a machine that MR 2.2 is installed and in the classpath. Then, when the user submitted a simple wordcount job, he saw the following message: {code} 16:00:41,027 INFO main mapreduce.Job:1345 - map 100% reduce 100% 16:00:41,036 INFO main mapreduce.Job:1356 - Job job_1396468045458_0006 completed successfully 16:02:20,535 WARN main mapreduce.JobRunner:212 - Cannot start job [wordcountJob] java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES at java.lang.Enum.valueOf(Enum.java:236) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148) at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182) at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154) at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370) at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756) at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289) . . . {code} The problem is that the wordcount job was running on one or more than one nodes of the YARN cluster, where MR 2.3 libs were installed, and JobCounter.MB_MILLIS_REDUCES is available in the counters. On the other side, due to the classpath setting, the client was likely to run with MR 2.2 libs. After the client retrieved the counters from MR AM, it tried to construct the Counter object with the received counter name. Unfortunately, the enum didn't exist in the client's classpath. Therefore, No enum constant exception is thrown here. JobCounter.MB_MILLIS_REDUCES is brought to MR2 via MAPREDUCE-5464 since Hadoop 2.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6087) MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-6087: - Status: Patch Available (was: Open) Thanks Vinod for the comment. Updated the patch to deprecate the old property. MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong Key: MAPREDUCE-6087 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6087 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jian He Assignee: Akira AJISAKA Labels: newbie Attachments: MAPREDUCE-6087.2.patch, MAPREDUCE-6087.patch The config name for MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS now has double prefix as yarn.app.mapreduce. + yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6087) MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136660#comment-14136660 ] Hadoop QA commented on MAPREDUCE-6087: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669288/MAPREDUCE-6087.2.patch against trunk revision b6d3230. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4888//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4888//console This message is automatically generated. MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS config name is wrong Key: MAPREDUCE-6087 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6087 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jian He Assignee: Akira AJISAKA Labels: newbie Attachments: MAPREDUCE-6087.2.patch, MAPREDUCE-6087.patch The config name for MRJobConfig#MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS now has double prefix as yarn.app.mapreduce. + yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5279) Jobs can deadlock if headroom is limited by cpu instead of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136678#comment-14136678 ] Peng Zhang commented on MAPREDUCE-5279: --- LGTM Jobs can deadlock if headroom is limited by cpu instead of memory - Key: MAPREDUCE-5279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, scheduler Affects Versions: 2.0.3-alpha Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Attachments: MAPREDUCE-5279-v2.patch, MAPREDUCE-5279.patch, apache-mapreduce-5279.3.patch, apache-mapreduce-5279.4.patch, apache-mapreduce-5279.5.patch YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't take into account virtual cores while scheduling reduce tasks. This may cause more reduce tasks to be scheduled because memory is enough. And on a small cluster, this will end with deadlock, all running containers are reduce tasks but map phase is not finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6061) Fix MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS property in MRJobConfig
[ https://issues.apache.org/jira/browse/MAPREDUCE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136732#comment-14136732 ] Ray Chiang commented on MAPREDUCE-6061: --- This is one of four JIRAs I'd like to get in before I do more .xml cleanup. Does anyone else have any comments? Fix MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS property in MRJobConfig --- Key: MAPREDUCE-6061 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6061 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie Attachments: MAPREDUCE-6061-01.patch The property MR_CLIENT_TO_AM_IPC_MAX_RETRIES_ON_TIMEOUTS is defined as: MR_PREFIX + yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts which results in the prefix part showing up twice. It should be MR_PREFIX + client-am.ipc.max-retries-on-timeouts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6057) Remove obsolete entries from mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136741#comment-14136741 ] Ray Chiang commented on MAPREDUCE-6057: --- Any comments on this patch from anyone? Remove obsolete entries from mapred-default.xml --- Key: MAPREDUCE-6057 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6057 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: newbie Attachments: MAPREDUCE-6057-01.patch, MAPREDUCE-6057-02.patch The following properties are defined in mapred-default.xml but no longer exist in MRJobConfig. map.sort.class mapred.child.env mapred.child.java.opts mapreduce.app-submission.cross-platform mapreduce.client.completion.pollinterval mapreduce.client.output.filter mapreduce.client.progressmonitor.pollinterval mapreduce.client.submit.file.replication mapreduce.cluster.acls.enabled mapreduce.cluster.local.dir mapreduce.framework.name mapreduce.ifile.readahead mapreduce.ifile.readahead.bytes mapreduce.input.fileinputformat.list-status.num-threads mapreduce.input.fileinputformat.split.minsize mapreduce.input.lineinputformat.linespermap mapreduce.job.counters.limit mapreduce.job.max.split.locations mapreduce.job.reduce.shuffle.consumer.plugin.class mapreduce.jobhistory.address mapreduce.jobhistory.admin.acl mapreduce.jobhistory.admin.address mapreduce.jobhistory.cleaner.enable mapreduce.jobhistory.cleaner.interval-ms mapreduce.jobhistory.client.thread-count mapreduce.jobhistory.datestring.cache.size mapreduce.jobhistory.done-dir mapreduce.jobhistory.http.policy mapreduce.jobhistory.intermediate-done-dir mapreduce.jobhistory.joblist.cache.size mapreduce.jobhistory.keytab mapreduce.jobhistory.loadedjobs.cache.size mapreduce.jobhistory.max-age-ms mapreduce.jobhistory.minicluster.fixed.ports mapreduce.jobhistory.move.interval-ms mapreduce.jobhistory.move.thread-count mapreduce.jobhistory.principal mapreduce.jobhistory.recovery.enable mapreduce.jobhistory.recovery.store.class mapreduce.jobhistory.recovery.store.fs.uri mapreduce.jobhistory.store.class mapreduce.jobhistory.webapp.address mapreduce.local.clientfactory.class.name mapreduce.map.skip.proc.count.autoincr mapreduce.output.fileoutputformat.compress mapreduce.output.fileoutputformat.compress.codec mapreduce.output.fileoutputformat.compress.type mapreduce.reduce.skip.proc.count.autoincr mapreduce.shuffle.connection-keep-alive.enable mapreduce.shuffle.connection-keep-alive.timeout mapreduce.shuffle.max.connections mapreduce.shuffle.max.threads mapreduce.shuffle.port mapreduce.shuffle.ssl.enabled mapreduce.shuffle.ssl.file.buffer.size mapreduce.shuffle.transfer.buffer.size mapreduce.shuffle.transferTo.allowed yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts Submitting bug for comment/feedback about which properties should be kept in mapred-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)