[jira] [Resolved] (MAPREDUCE-3048) Fix test-patch to run tests via mvn clean install test
[ https://issues.apache.org/jira/browse/MAPREDUCE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved MAPREDUCE-3048. Resolution: Fixed Assignee: Vinod Kumar Vavilapalli I don't see any objections to this, but this is really needed and useful. I just committed this to trunk and branch-0.23. Fix test-patch to run tests via mvn clean install test Key: MAPREDUCE-3048 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3048 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 0.23.0 Attachments: MAPREDUCE-3048.txt Some tests like the ones failing at MAPREDUCE-3040 depend on the generated jars. TestMRJobs for e.g. won't run if we simply run mvn clean test. I propose that we change test-patch to run tests using mvn clean install test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Hadoop-Mapreduce-trunk-Commit - Build # 954 - Still Failing
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/954/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 14024 lines...] [junit] Running org.apache.hadoop.mapred.TestMapRed [junit] Tests run: 5, Failures: 2, Errors: 3, Time elapsed: 1.319 sec [junit] Test org.apache.hadoop.mapred.TestMapRed FAILED [junit] Running org.apache.hadoop.mapred.TestMiniMRDFSCaching [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 32.517 sec [junit] Running org.apache.hadoop.mapred.TestQueueAclsForCurrentUser [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.576 sec [junit] Running org.apache.hadoop.mapred.TestRackAwareTaskPlacement [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.479 sec [junit] Running org.apache.hadoop.mapred.TestReduceFetchFromPartialMem [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 31.85 sec [junit] Running org.apache.hadoop.mapred.TestReduceTask [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.619 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryInputFormat [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.761 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryOutputFormat [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.023 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.201 sec [junit] Running org.apache.hadoop.mapred.TestSeveral [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 42.722 sec [junit] Running org.apache.hadoop.mapred.TestSpeculativeExecution [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.016 sec [junit] Running org.apache.hadoop.mapred.TestTaskLimits [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.889 sec [junit] Running org.apache.hadoop.mapred.TestTaskTrackerBlacklisting [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.716 sec [junit] Running org.apache.hadoop.mapred.TestTextInputFormat [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 56.338 sec [junit] Running org.apache.hadoop.mapred.TestTextOutputFormat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.18 sec [junit] Running org.apache.hadoop.mapred.TestTrackerBlacklistAcrossJobs [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 47.014 sec [junit] Running org.apache.hadoop.mapreduce.TestCounters [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.35 sec [junit] Running org.apache.hadoop.mapreduce.TestMapCollection [junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 0.62 sec [junit] Test org.apache.hadoop.mapreduce.TestMapCollection FAILED [junit] Running org.apache.hadoop.mapreduce.TestMapReduceLocal [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.319 sec [junit] Running org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.946 sec [junit] Running org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec checkfailure: [touch] Creating /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build/test/testsfailed BUILD FAILED /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:792: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:755: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:816: Tests failed! Total time: 5 minutes 58 seconds Build step 'Execute shell' marked build as failure Recording test results Updating MAPREDUCE-3048 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
Hadoop-Mapreduce-trunk-Commit - Build # 955 - Still Failing
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/955/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 14024 lines...] [junit] Running org.apache.hadoop.mapred.TestMapRed [junit] Tests run: 5, Failures: 2, Errors: 3, Time elapsed: 1.316 sec [junit] Test org.apache.hadoop.mapred.TestMapRed FAILED [junit] Running org.apache.hadoop.mapred.TestMiniMRDFSCaching [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 29.841 sec [junit] Running org.apache.hadoop.mapred.TestQueueAclsForCurrentUser [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec [junit] Running org.apache.hadoop.mapred.TestRackAwareTaskPlacement [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.488 sec [junit] Running org.apache.hadoop.mapred.TestReduceFetchFromPartialMem [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 38.045 sec [junit] Running org.apache.hadoop.mapred.TestReduceTask [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.616 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryInputFormat [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.77 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryOutputFormat [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.941 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.729 sec [junit] Running org.apache.hadoop.mapred.TestSeveral [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 42.979 sec [junit] Running org.apache.hadoop.mapred.TestSpeculativeExecution [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.203 sec [junit] Running org.apache.hadoop.mapred.TestTaskLimits [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.888 sec [junit] Running org.apache.hadoop.mapred.TestTaskTrackerBlacklisting [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.699 sec [junit] Running org.apache.hadoop.mapred.TestTextInputFormat [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 92.938 sec [junit] Running org.apache.hadoop.mapred.TestTextOutputFormat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.178 sec [junit] Running org.apache.hadoop.mapred.TestTrackerBlacklistAcrossJobs [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 47.112 sec [junit] Running org.apache.hadoop.mapreduce.TestCounters [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.347 sec [junit] Running org.apache.hadoop.mapreduce.TestMapCollection [junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 0.614 sec [junit] Test org.apache.hadoop.mapreduce.TestMapCollection FAILED [junit] Running org.apache.hadoop.mapreduce.TestMapReduceLocal [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.134 sec [junit] Running org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.944 sec [junit] Running org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec checkfailure: [touch] Creating /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build/test/testsfailed BUILD FAILED /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:792: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:755: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:816: Tests failed! Total time: 6 minutes 34 seconds Build step 'Execute shell' marked build as failure Recording test results Updating HDFS-46 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Created] (MAPREDUCE-3069) running test-patch gives me 10 warnings for missing 'build.plugins.plugin.version' of org.apache.rat:apache-rat-plugin
running test-patch gives me 10 warnings for missing 'build.plugins.plugin.version' of org.apache.rat:apache-rat-plugin -- Key: MAPREDUCE-3069 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3069 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 0.23.0 apache-rat-plugin doesn't have a version specified in hadoop-mapreduce-project and hadoop-yarn pom.xml files -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3071) app master configuration web UI link under the Job menu opens up application menu
app master configuration web UI link under the Job menu opens up application menu - Key: MAPREDUCE-3071 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3071 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Fix For: 0.23.0 If you go to the app master web UI for a particular job. The job menu on the left side displays links for overview, counters, configuration, etc.. If you click on the configuration one, it closes the job menu and opens the application menu on that left side. It shouldn't do this. It should leave the job menu open. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Hadoop-Mapreduce-trunk-Commit - Build # 956 - Still Failing
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/956/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 14024 lines...] [junit] Running org.apache.hadoop.mapred.TestMapRed [junit] Tests run: 5, Failures: 2, Errors: 3, Time elapsed: 1.408 sec [junit] Test org.apache.hadoop.mapred.TestMapRed FAILED [junit] Running org.apache.hadoop.mapred.TestMiniMRDFSCaching [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 29.5 sec [junit] Running org.apache.hadoop.mapred.TestQueueAclsForCurrentUser [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec [junit] Running org.apache.hadoop.mapred.TestRackAwareTaskPlacement [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.522 sec [junit] Running org.apache.hadoop.mapred.TestReduceFetchFromPartialMem [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 32.15 sec [junit] Running org.apache.hadoop.mapred.TestReduceTask [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.619 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryInputFormat [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.762 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryOutputFormat [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.999 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.114 sec [junit] Running org.apache.hadoop.mapred.TestSeveral [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 43.656 sec [junit] Running org.apache.hadoop.mapred.TestSpeculativeExecution [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.17 sec [junit] Running org.apache.hadoop.mapred.TestTaskLimits [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.885 sec [junit] Running org.apache.hadoop.mapred.TestTaskTrackerBlacklisting [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.714 sec [junit] Running org.apache.hadoop.mapred.TestTextInputFormat [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 51.031 sec [junit] Running org.apache.hadoop.mapred.TestTextOutputFormat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.181 sec [junit] Running org.apache.hadoop.mapred.TestTrackerBlacklistAcrossJobs [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 47.079 sec [junit] Running org.apache.hadoop.mapreduce.TestCounters [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.347 sec [junit] Running org.apache.hadoop.mapreduce.TestMapCollection [junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 0.621 sec [junit] Test org.apache.hadoop.mapreduce.TestMapCollection FAILED [junit] Running org.apache.hadoop.mapreduce.TestMapReduceLocal [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.316 sec [junit] Running org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.943 sec [junit] Running org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.583 sec checkfailure: [touch] Creating /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build/test/testsfailed BUILD FAILED /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:792: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:755: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:816: Tests failed! Total time: 6 minutes 0 seconds Build step 'Execute shell' marked build as failure Recording test results Updating MAPREDUCE-2754 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Resolved] (MAPREDUCE-3065) ApplicationMaster killed by NodeManager due to excessive virtual memory consumption
[ https://issues.apache.org/jira/browse/MAPREDUCE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini resolved MAPREDUCE-3065. Resolution: Fixed Per vinod, new ticket to track this: https://issues.apache.org/jira/browse/MAPREDUCE-3068 ApplicationMaster killed by NodeManager due to excessive virtual memory consumption --- Key: MAPREDUCE-3065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3065 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.24.0 Reporter: Chris Riccomini Hey Vinod, OK, so I have a little more clarity into this. When I bump my resource request for my AM to 4096, it runs. The important line in the NM logs is: 2011-09-21 13:43:44,366 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 25656 for container-id container_1316637655278_0001_01_01 : Virtual 2260938752 bytes, limit : 4294967296 bytes; Physical 120860672 bytes, limit -1 bytes The thing to note is the virtual memory, which is off the charts, even though my physical memory is almost nothing (12 megs). I'm still poking around the code, but I am noticing that there are two checks in the NM, one for virtual mem, and one for physical mem. The virtual memory check appears to be toggle-able, but is presumably defaulted to on. At this point I'm trying to figure out exactly what the VMEM check is for, why YARN thinks my app is taking 2 gigs, and how to fix this. Cheers, Chris From: Chris Riccomini [criccom...@linkedin.com] Sent: Wednesday, September 21, 2011 1:42 PM To: mapreduce-dev@hadoop.apache.org Subject: Re: ApplicationMaster Memory Usage For the record, I bumped to 4096 for memory resource request, and it works. :( On 9/21/11 1:32 PM, Chris Riccomini criccom...@linkedin.com wrote: Hey Vinod, So, I ran my application master directly from the CLI. I commented out the YARN-specific code. It runs fine without leaking memory. I then ran it from YARN, with all YARN-specific code commented it. It again ran fine. I then uncommented JUST my registerWithResourceManager call. It then fails with OOM after a few seconds. I call registerWithResourceManager, and then go into a while(true) { println(yeh) sleep(1000) }. Doing this prints: yeh yeh yeh yeh yeh At which point, it dies, and, in the NodeManager,I see: 2011-09-21 13:24:51,036 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:isProcessTreeOverLimit(289)) - Process tree for container: container_1316626117280_0005_01_01 has processes older than 1 iteration running over the configured limit. Limit=2147483648, current usage = 2192773120 2011-09-21 13:24:51,037 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(453)) - Container [pid=23852,containerID=container_1316626117280_0005_01_01] is running beyond memory-limits. Current usage : 2192773120bytes. Limit : 2147483648bytes. Killing container. Dump of the process-tree for container_1316626117280_0005_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 23852 20570 23852 23852 (bash) 0 0 108638208 303 /bin/bash -c java -Xmx512M -cp './package/*' kafka.yarn.ApplicationMaster /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280 com.linkedin.TODO 1 1/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000 001/stdout 2/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000 001/stderr |- 23855 23852 23852 23852 (java) 81 4 2084134912 14772 java -Xmx512M -cp ./package/* kafka.yarn.ApplicationMaster /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280 com.linkedin.TODO 1 2011-09-21 13:24:51,037 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - Removed ProcessTree with root 23852 Either something is leaking in YARN, or my registerWithResourceManager code (see below) is doing something funky. I'm trying to avoid going through all the pain of attaching a remote debugger. Presumably things aren't leaking in YARN, which means it's likely that I'm doing something wrong in my registration code. Incidentally, my NodeManager is running with 1000 megs. My application master memory is set to 2048, and my -Xmx setting is 512M Cheers, Chris From: Vinod Kumar Vavilapalli [vino...@hortonworks.com] Sent: Wednesday, September 21, 2011 11:52 AM To:
[jira] [Created] (MAPREDUCE-3072) NodeManager doesn't recognize kill -9 of AM container
NodeManager doesn't recognize kill -9 of AM container - Key: MAPREDUCE-3072 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3072 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.23.0 Environment: [criccomi@criccomi-ld trunk]$ svn info Path: . URL: http://svn.apache.org/repos/asf/hadoop/common/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 1174189 Node Kind: directory Schedule: normal Last Changed Author: szetszwo Last Changed Rev: 1173990 Last Changed Date: 2011-09-22 01:25:20 -0700 (Thu, 22 Sep 2011) Reporter: Chris Riccomini If I kill -9 my application master's pid, the NM continues reporting that the container is running. I assume it should probably instead report back to the RM that the AM has died. Instead, it continues sending this status: 2011-09-22 09:33:13,352 INFO nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNodeStatus(222)) - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1316707951832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: \n, exit_status: -1000, 2011-09-22 09:33:13,682 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 27263 for container-id container_1316707951832_0001_01_01 : Virtual 0 bytes, limit : 2147483648 bytes; Physical 0 bytes, limit -1 bytes This status keeps being sent forever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3072) NodeManager doesn't recognize kill -9 of AM container
[ https://issues.apache.org/jira/browse/MAPREDUCE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved MAPREDUCE-3072. Resolution: Duplicate This is the same as MAPREDUCE-3031. NodeManager doesn't recognize kill -9 of AM container - Key: MAPREDUCE-3072 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3072 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.23.0 Environment: [criccomi@criccomi-ld trunk]$ svn info Path: . URL: http://svn.apache.org/repos/asf/hadoop/common/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 1174189 Node Kind: directory Schedule: normal Last Changed Author: szetszwo Last Changed Rev: 1173990 Last Changed Date: 2011-09-22 01:25:20 -0700 (Thu, 22 Sep 2011) Reporter: Chris Riccomini If I kill -9 my application master's pid, the NM continues reporting that the container is running. I assume it should probably instead report back to the RM that the AM has died. Instead, it continues sending this status: 2011-09-22 09:33:13,352 INFO nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNodeStatus(222)) - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1316707951832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: \n, exit_status: -1000, 2011-09-22 09:33:13,682 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 27263 for container-id container_1316707951832_0001_01_01 : Virtual 0 bytes, limit : 2147483648 bytes; Physical 0 bytes, limit -1 bytes This status keeps being sent forever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2790) [MR-279] Add additional field for storing the AM/job history info on CLI
[ https://issues.apache.org/jira/browse/MAPREDUCE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash resolved MAPREDUCE-2790. - Resolution: Duplicate [MR-279] Add additional field for storing the AM/job history info on CLI Key: MAPREDUCE-2790 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2790 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ramya Sunil Assignee: Ravi Prakash Priority: Critical Fix For: 0.23.0 Attachments: MAPREDUCE-2790.v1.txt, MAPREDUCE-2790.v2.txt, MAPREDUCE-2790.v3.txt, MAPREDUCE-2790.v4.txt bin/mapred job [-list [all]] displays the AM or job history location in the SchedulingInfo field. An additional column has to be added to display the AM/job history information. Currently, the output reads: {noformat} JobId State StartTime UserNameQueue Priority SchedulingInfo jobID FAILED 0 ramya default NORMAL AM information/job history location {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3073) Build failure for MRv1 caused due to changes to MRConstants.
Build failure for MRv1 caused due to changes to MRConstants. Key: MAPREDUCE-3073 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3073 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 0.23.0 When runnning ant -Dresolvers=internal binary, the build seems to be failing with: [javac] public class JobTracker implements MRConstants, InterTrackerProtocol, [javac]^ [javac] /home/y/var/builds/thread2/workspace/Cloud-Yarn-0.23-Secondary/hadoop-mapred uce-project/src/java/org/apache/hadoop/mapred/TaskTracker.java:131: interface expected here [javac] implements MRConstants, TaskUmbilicalProtocol, Runnable, TTConfig { [javac]^ [javac] /home/y/var/builds/thread2/workspace/Cloud-Yarn-0.23-Secondary/hadoop-mapred uce-project/src/java/org/apache/hadoop/mapred/TaskTracker.java:552: cannot find symbol [javac] symbol : variable WORKDIR [javac] location: class org.apache.hadoop.mapred.MRConstants [javac] return getLocalJobDir(user, jobid) + Path.SEPARATOR + MRConstants.WORKDIR; [javac] ^ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3074) add location to web UI so you know where you are - cluster, node, AM, job history
add location to web UI so you know where you are - cluster, node, AM, job history - Key: MAPREDUCE-3074 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3074 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Fix For: 0.23.0 Right now if you go to any of the web UIs for resource manager, node manager, app master, or job history, they look very similar but sometimes it hard to tell which page you are. Adding a title or something that lets you know would be helpful. Or somehow make them more seemless so one doesn't have to know. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3075) Web UI menu inconsistencies
Web UI menu inconsistencies --- Key: MAPREDUCE-3075 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3075 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Fix For: 0.23.0 When you go to the various web UI's the menus on the left are inconsistent and (atleast to me) sometimes confusing. For instance if you go to the application master UI, one of the menus is Cluster. If you click on one of the Cluster links it takes you back to the RM ui and you lose the app master UI altogether. Maybe its just me but that is confusing. I like having a link back to the cluster from AM but the way the UI is setup I would have expected it to just open that page in the middle div/frame and leave the AM menus there. Perhaps a different type of link or menu to indicate this is going to take you away from AM page. Also, the nodes and job history UI don't have the Cluster menus at all. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Hadoop-Mapreduce-trunk-Commit - Build # 957 - Still Failing
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/957/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 14071 lines...] [junit] Running org.apache.hadoop.mapred.TestMapRed [junit] Tests run: 5, Failures: 2, Errors: 3, Time elapsed: 1.312 sec [junit] Test org.apache.hadoop.mapred.TestMapRed FAILED [junit] Running org.apache.hadoop.mapred.TestMiniMRDFSCaching [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 32.756 sec [junit] Running org.apache.hadoop.mapred.TestQueueAclsForCurrentUser [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec [junit] Running org.apache.hadoop.mapred.TestRackAwareTaskPlacement [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.52 sec [junit] Running org.apache.hadoop.mapred.TestReduceFetchFromPartialMem [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 36.253 sec [junit] Running org.apache.hadoop.mapred.TestReduceTask [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.621 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryInputFormat [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.771 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryOutputFormat [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.941 sec [junit] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.886 sec [junit] Running org.apache.hadoop.mapred.TestSeveral [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 43.063 sec [junit] Running org.apache.hadoop.mapred.TestSpeculativeExecution [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.157 sec [junit] Running org.apache.hadoop.mapred.TestTaskLimits [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.895 sec [junit] Running org.apache.hadoop.mapred.TestTaskTrackerBlacklisting [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.707 sec [junit] Running org.apache.hadoop.mapred.TestTextInputFormat [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 56.258 sec [junit] Running org.apache.hadoop.mapred.TestTextOutputFormat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.178 sec [junit] Running org.apache.hadoop.mapred.TestTrackerBlacklistAcrossJobs [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 46.851 sec [junit] Running org.apache.hadoop.mapreduce.TestCounters [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.344 sec [junit] Running org.apache.hadoop.mapreduce.TestMapCollection [junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 0.616 sec [junit] Test org.apache.hadoop.mapreduce.TestMapCollection FAILED [junit] Running org.apache.hadoop.mapreduce.TestMapReduceLocal [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.336 sec [junit] Running org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.967 sec [junit] Running org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.583 sec checkfailure: [touch] Creating /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build/test/testsfailed BUILD FAILED /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:792: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:755: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:816: Tests failed! Total time: 6 minutes 11 seconds Build step 'Execute shell' marked build as failure Recording test results Updating MAPREDUCE-3073 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
Matching ResourceRequest to Container
Hey Guys, I’m sure there’s a way to do this, but I’m missing it. If I have an AllocateRequest with multiple ResourceRequests in it (say 2), and I get two containers back, how do I map which container was sent back for which ResourceRequest? Thanks! Chris
Re: Matching ResourceRequest to Container
You are running into MR-2616. There is a stale patch, should be easy to fix - I can do it, you are welcome to. Arun On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote: Hey Guys, I’m sure there’s a way to do this, but I’m missing it. If I have an AllocateRequest with multiple ResourceRequests in it (say 2), and I get two containers back, how do I map which container was sent back for which ResourceRequest? Thanks! Chris
Re: Matching ResourceRequest to Container
Hey Arun, MR-2616 looks like a gridmix bug. Are you sure this is the right ticket number? Thanks! Chris On 9/22/11 2:04 PM, Arun C Murthy a...@hortonworks.com wrote: You are running into MR-2616. There is a stale patch, should be easy to fix - I can do it, you are welcome to. Arun On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote: Hey Guys, I¹m sure there¹s a way to do this, but I¹m missing it. If I have an AllocateRequest with multiple ResourceRequests in it (say 2), and I get two containers back, how do I map which container was sent back for which ResourceRequest? Thanks! Chris
Re: Matching ResourceRequest to Container
Oops, sorry - it's https://issues.apache.org/jira/browse/MAPREDUCE-2646 Arun On Sep 22, 2011, at 2:11 PM, Chris Riccomini wrote: Hey Arun, MR-2616 looks like a gridmix bug. Are you sure this is the right ticket number? Thanks! Chris On 9/22/11 2:04 PM, Arun C Murthy a...@hortonworks.com wrote: You are running into MR-2616. There is a stale patch, should be easy to fix - I can do it, you are welcome to. Arun On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote: Hey Guys, I’m sure there’s a way to do this, but I’m missing it. If I have an AllocateRequest with multiple ResourceRequests in it (say 2), and I get two containers back, how do I map which container was sent back for which ResourceRequest? Thanks! Chris
Re: Matching ResourceRequest to Container
Also, for now you can assume it's the highest outstanding priority or wait for MR-2646. On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote: Hey Guys, I’m sure there’s a way to do this, but I’m missing it. If I have an AllocateRequest with multiple ResourceRequests in it (say 2), and I get two containers back, how do I map which container was sent back for which ResourceRequest? Thanks! Chris
Re: Matching ResourceRequest to Container
Hey Arun, I think I see. Basically, 2646's patch is using the priority number as an ID for a container (or group of containers) within an AllocateRequest. So, in my scenario, I could set ResourceRequest 1 to priority 1, and ResourceRequest 2 to priority 2, and (with this patch) get the priority back out on the other end. Is this correct? Are priorities cross-application, or just for containers within the application? Cheers, Chris On 9/22/11 2:13 PM, Arun C Murthy a...@hortonworks.com wrote: Also, for now you can assume it's the highest outstanding priority or wait for MR-2646. On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote: Hey Guys, I¹m sure there¹s a way to do this, but I¹m missing it. If I have an AllocateRequest with multiple ResourceRequests in it (say 2), and I get two containers back, how do I map which container was sent back for which ResourceRequest? Thanks! Chris
Re: Matching ResourceRequest to Container
Priorities are within you application. You always get P0 containers before you get P1 containers and so on Arun On Sep 22, 2011, at 2:23 PM, Chris Riccomini wrote: Hey Arun, I think I see. Basically, 2646's patch is using the priority number as an ID for a container (or group of containers) within an AllocateRequest. So, in my scenario, I could set ResourceRequest 1 to priority 1, and ResourceRequest 2 to priority 2, and (with this patch) get the priority back out on the other end. Is this correct? Are priorities cross-application, or just for containers within the application? Cheers, Chris On 9/22/11 2:13 PM, Arun C Murthy a...@hortonworks.com wrote: Also, for now you can assume it's the highest outstanding priority or wait for MR-2646. On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote: Hey Guys, I’m sure there’s a way to do this, but I’m missing it. If I have an AllocateRequest with multiple ResourceRequests in it (say 2), and I get two containers back, how do I map which container was sent back for which ResourceRequest? Thanks! Chris
Re: Matching ResourceRequest to Container
Good deal, thanks. PS- Haven't forgotten about MALLOC stuff- just lower on priority list at the moment. More to come. On 9/22/11 2:32 PM, Arun C Murthy a...@hortonworks.com wrote: Priorities are within you application. You always get P0 containers before you get P1 containers and so on Arun On Sep 22, 2011, at 2:23 PM, Chris Riccomini wrote: Hey Arun, I think I see. Basically, 2646's patch is using the priority number as an ID for a container (or group of containers) within an AllocateRequest. So, in my scenario, I could set ResourceRequest 1 to priority 1, and ResourceRequest 2 to priority 2, and (with this patch) get the priority back out on the other end. Is this correct? Are priorities cross-application, or just for containers within the application? Cheers, Chris On 9/22/11 2:13 PM, Arun C Murthy a...@hortonworks.com wrote: Also, for now you can assume it's the highest outstanding priority or wait for MR-2646. On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote: Hey Guys, I¹m sure there¹s a way to do this, but I¹m missing it. If I have an AllocateRequest with multiple ResourceRequests in it (say 2), and I get two containers back, how do I map which container was sent back for which ResourceRequest? Thanks! Chris
[jira] [Resolved] (MAPREDUCE-2717) Client should be able to know why an AM crashed.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy resolved MAPREDUCE-2717. -- Resolution: Duplicate Assignee: (was: Siddharth Seth) Most are fixed, now the diagnostics part is dup of MAPREDUCE-3065 Client should be able to know why an AM crashed. Key: MAPREDUCE-2717 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2717 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Amol Kekre Priority: Blocker Fix For: 0.23.0 Today if an AM crashes, we have to dig through logs - very cumbersome. It is good to have client print some reason for AM crash. Various possible reasons for AM crash: (1) AM container failed during localization itself. (2) AM container launched but failed before properly starting, for e.g. due to classpath issues (3) AM failed after starting properly. (4) an AM is expired and killed by the RM Potential fixes: - For (1) and (2) the client should obtain the container-status, container diagnostics and exit code. - For (3), the AM should set some kind of reason for failure during its heartbeat to RM and the client should obtain the same from RM. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3076) TestSleepJob fails
TestSleepJob fails --- Key: MAPREDUCE-3076 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3076 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.20.205.0 Reporter: Arun C Murthy Assignee: Arun C Murthy Priority: Blocker Fix For: 0.20.205.0 Attachments: MAPREDUCE-3076.patch TestSleepJob fails, it was intended to be used in other tests for MAPREDUCE-2981. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3077) re-enable faulty TaskTracker storage without restarting TT, when appropriate
re-enable faulty TaskTracker storage without restarting TT, when appropriate Key: MAPREDUCE-3077 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3077 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.20.205.0 Reporter: Matt Foley In MAPREDUCE-2928, Ravi Gummadi proposed: bq. we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that can do disk-health-check of bad local dirs and add dirs to the good local dirs list if they become good. and Eli Collins added: bq. Sounds good. Since transient disk failures may cause a file system to become read-only (causing permanent failures) sometimes re-mounting is sufficient to recover in which case it makes sense to re-enable faulty disks w/o TT restart. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Created] (MAPREDUCE-3077) re-enable faulty TaskTracker storage without restarting TT, when appropriate
On 9/22/11 6:19 PM, Matt Foley (JIRA) j...@apache.org wrote: re-enable faulty TaskTracker storage without restarting TT, when appropriate Key: MAPREDUCE-3077 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3077 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.20.205.0 Reporter: Matt Foley In MAPREDUCE-2928, Ravi Gummadi proposed: bq. we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that can do disk-health-check of bad local dirs and add dirs to the good local dirs list if they become good. and Eli Collins added: bq. Sounds good. Since transient disk failures may cause a file system to become read-only (causing permanent failures) sometimes re-mounting is sufficient to recover in which case it makes sense to re-enable faulty disks w/o TT restart. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira