[jira] [Created] (MAPREDUCE-5131) Provide better handling of job status related apis during JT restart
Arun C Murthy created MAPREDUCE-5131: Summary: Provide better handling of job status related apis during JT restart Key: MAPREDUCE-5131 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5131 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arun C Murthy Assignee: Arun C Murthy I've seen pig/hive applications bork during JT restart since they get NPEs - this is due to fact that jobs are not really inited, but are submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5132) Enhance FairScheduler queue selection with Unix-SubGroups
Ted Malaska created MAPREDUCE-5132: -- Summary: Enhance FairScheduler queue selection with Unix-SubGroups Key: MAPREDUCE-5132 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5132 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ted Malaska Priority: Minor I have clients that use the primary group for other reasons and would like to select the fairScheduler queue by subgroup. Currently the fairscheduler can only link a job config property to a fairscheduler queue and subgroups are not in the job config. Now thankfully we can get this data from UserGroupInformation.createProxyUser which is cool because this is how TT currently see if it can copy data when kerberos is enabled. Once we have the sub-groups we just connect the first sub-group that matches a queue. I'll put up some code that I used to solve the problem in the short term. Note that there is an issue with this code. It doesn't populate the TT Web UI drop down menu correctly. I think I have a solution for that, but before I implement that I was wondering if the community has any feedback first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4534) Test failures with Container .. is running beyond virtual memory limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov resolved MAPREDUCE-4534. Resolution: Duplicate Test failures with Container .. is running beyond virtual memory limits - Key: MAPREDUCE-4534 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4534 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.23.3 Reporter: Ilya Katsov Tests org.apache.hadoop.tools.TestHadoopArchives.{testRelativePath,testPathWithSpaces} fail with the following message: {code} Container [pid=7785,containerID=container_1342495768864_0001_01_01] is running beyond virtual memory limits. Current usage: 143.6mb of 1.5gb physical memory used; 3.4gb of 3.1gb virtual memory used. Killing container. Dump of the process-tree for container_1342495768864_0001_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7797 7785 7785 7785 (java) 573 38 3517018112 36421 /usr/java/jdk1.6.0_33/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster |- 7785 7101 7785 7785 (bash) 1 1 108605440 332 /bin/bash -c /usr/java/jdk1.6.0_33/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01/stdout 2/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01/stderr {code} This is not a stably reproducible problem, but adding MALLOC_ARENA_MAX resolves the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5133) TestSubmitJob.testSecureJobExecution is flaky due to job dir deletion race
Sandy Ryza created MAPREDUCE-5133: - Summary: TestSubmitJob.testSecureJobExecution is flaky due to job dir deletion race Key: MAPREDUCE-5133 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5133 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 1.1.2 Reporter: Sandy Ryza Assignee: Sandy Ryza At the end of TestSubmitJob.testSecureJobExecution, the test waits for the job to be done and then asserts that the job submission directory has been deleted. The directory is deleted by an asynchronous cleanup thread, so the test can hit the assert before the deletion is run. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5134) Default settings cause LocalJobRunner to OOME
Sandy Ryza created MAPREDUCE-5134: - Summary: Default settings cause LocalJobRunner to OOME Key: MAPREDUCE-5134 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5134 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza If I run a job using the local job runner with vanilla settings, I get an out of memory error. This seems to be because the default client memory maximum is 128 MB, and the default io.sort.mb is 100 MB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5131) Provide better handling of job status related apis during JT restart
[ https://issues.apache.org/jira/browse/MAPREDUCE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy resolved MAPREDUCE-5131. -- Resolution: Fixed Fix Version/s: 1.2.0 Thanks for the reviews [~szetszwo] and [~kkambatl]. I just committed this. Provide better handling of job status related apis during JT restart Key: MAPREDUCE-5131 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5131 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 1.2.0 Attachments: MAPREDUCE-5131.patch, MAPREDUCE-5131.patch, MAPREDUCE-5131.patch, MAPREDUCE-5131.patch I've seen pig/hive applications bork during JT restart since they get NPEs - this is due to fact that jobs are not really inited, but are submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira