[jira] [Created] (MAPREDUCE-5131) Provide better handling of job status related apis during JT restart

2013-04-05 Thread Arun C Murthy (JIRA)
Arun C Murthy created MAPREDUCE-5131:


 Summary: Provide better handling of job status related apis during 
JT restart
 Key: MAPREDUCE-5131
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5131
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Arun C Murthy


I've seen pig/hive applications bork during JT restart since they get NPEs - 
this is due to fact that jobs are not really inited, but are submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5132) Enhance FairScheduler queue selection with Unix-SubGroups

2013-04-05 Thread Ted Malaska (JIRA)
Ted Malaska created MAPREDUCE-5132:
--

 Summary: Enhance FairScheduler queue selection with Unix-SubGroups
 Key: MAPREDUCE-5132
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5132
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ted Malaska
Priority: Minor


I have clients that use the primary group for other reasons and would like to 
select the fairScheduler queue by subgroup.

Currently the fairscheduler can only link a job config property to a 
fairscheduler queue and subgroups are not in the job config.  

Now thankfully we can get this data from UserGroupInformation.createProxyUser 
which is cool because this is how TT currently see if it can copy data when 
kerberos is enabled.  Once we have the sub-groups we just connect the first 
sub-group that matches a queue.

I'll put up some code that I used to solve the problem in the short term.  

Note that there is an issue with this code.  It doesn't populate the TT Web UI 
drop down menu correctly.  I think I have a solution for that, but before I 
implement that I was wondering if the community has any feedback first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4534) Test failures with Container .. is running beyond virtual memory limits

2013-04-05 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov resolved MAPREDUCE-4534.


Resolution: Duplicate

 Test failures with Container .. is running beyond virtual memory limits
 -

 Key: MAPREDUCE-4534
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4534
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.23.3
Reporter: Ilya Katsov

 Tests 
 org.apache.hadoop.tools.TestHadoopArchives.{testRelativePath,testPathWithSpaces}
  fail with the following message:
 {code}
 Container [pid=7785,containerID=container_1342495768864_0001_01_01] is 
 running beyond virtual memory limits. Current usage: 143.6mb of 1.5gb 
 physical memory used; 3.4gb of 3.1gb virtual memory used. Killing container.
 Dump of the process-tree for container_1342495768864_0001_01_01 :
   |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
 SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
   |- 7797 7785 7785 7785 (java) 573 38 3517018112 36421 
 /usr/java/jdk1.6.0_33/jre/bin/java 
 -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01
  -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
 -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
   |- 7785 7101 7785 7785 (bash) 1 1 108605440 332 /bin/bash -c 
 /usr/java/jdk1.6.0_33/jre/bin/java 
 -Dlog4j.configuration=container-log4j.properties 
 -Dyarn.app.mapreduce.container.log.dir=/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01
  -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
 -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
 1/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01/stdout
  
 2/var/lib/jenkins/workspace/Hadoop_gd-branch0.23_integration/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1342495768864_0001/container_1342495768864_0001_01_01/stderr
 {code}
 This is not a stably reproducible problem, but adding MALLOC_ARENA_MAX 
 resolves the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5133) TestSubmitJob.testSecureJobExecution is flaky due to job dir deletion race

2013-04-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5133:
-

 Summary: TestSubmitJob.testSecureJobExecution is flaky due to job 
dir deletion race
 Key: MAPREDUCE-5133
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5133
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.2
Reporter: Sandy Ryza
Assignee: Sandy Ryza


At the end of TestSubmitJob.testSecureJobExecution, the test waits for the job 
to be done and then asserts that the job submission directory has been deleted. 
 The directory is deleted by an asynchronous cleanup thread, so the test can 
hit the assert before the deletion is run.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5134) Default settings cause LocalJobRunner to OOME

2013-04-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5134:
-

 Summary: Default settings cause LocalJobRunner to OOME
 Key: MAPREDUCE-5134
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5134
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza


If I run a job using the local job runner with vanilla settings, I get an out 
of memory error.  This seems to be because the default client memory maximum is 
128 MB, and the default io.sort.mb is 100 MB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5131) Provide better handling of job status related apis during JT restart

2013-04-05 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved MAPREDUCE-5131.
--

   Resolution: Fixed
Fix Version/s: 1.2.0

Thanks for the reviews [~szetszwo] and [~kkambatl]. I just committed this. 

 Provide better handling of job status related apis during JT restart
 

 Key: MAPREDUCE-5131
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5131
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 1.2.0

 Attachments: MAPREDUCE-5131.patch, MAPREDUCE-5131.patch, 
 MAPREDUCE-5131.patch, MAPREDUCE-5131.patch


 I've seen pig/hive applications bork during JT restart since they get NPEs - 
 this is due to fact that jobs are not really inited, but are submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira