Builder pattern for Configuration and Job, and no static .setInputPaths(job,path)
Hi all, I have got some issues with having to use stuff like: DistributedCache.addFileToClasspath() FileInputFormat.setInputPaths() FileOutputFormat.setOutputPaths() Job.getInstance(conf) What I want to do for very specific reasons is something that looks close to this: Configuration conf = Configuration .with(mapred.job.tracker, 10.10.10.10:8021) .with(fs.defaultFS, hdfs://10.10.10.10:8020) .build(); Job job = Job .withConfig(conf) .withJarByClass(MyJob.class) .withJobName(My MR Job) .withMapperClass(MyMapper.class) .withReducerClass(MyReducer.class) .withMapOutputKeyClass(LongWritable.class) .withMapOutputValueClass(Text.class) .withOutputKeyClass(LongWritable.class) .withOutputValueClass(Text.class) .withLibJars(new Path(...)) .withInputPaths(new Path(...)) .withOutputPath(new Path(...)) .build(); job.waitForCompletion(true); Is this something that has been considered? At least to get rid of the static setInputPaths and .setOutputPath and put them on Job? I saw it that way in demo code in the Javadoc for 2.0.3 but no implementation? Am I missing something here? Best, Bjorn
[jira] [Created] (MAPREDUCE-5126) Add possibility to set a custom system classloader for mapred child processes, separate from mapred.child.java.opts
Piotr Kołaczkowski created MAPREDUCE-5126: - Summary: Add possibility to set a custom system classloader for mapred child processes, separate from mapred.child.java.opts Key: MAPREDUCE-5126 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5126 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Piotr Kołaczkowski Priority: Minor Some third party frameworks / systems based on Hadoop might want to set a custom classloader for loading classes of their jobs to better resolve conflicts with their libraries. While it is possible to set a custom classloader using the mapred.child.java.opts, this field is often overriden by users in their job configuration. So in order to change e.g. heap sizes the user would need to remember also to include the custom classloader property from the framework-defaults or otherwise he would break the framework. This small patch introduces another parameter: mapred.child.java.class.loader that allows to set the classloader separately. This gives custom frameworks built on top of Hadoop more flexibility to supply their own classloader, without need to force users to adjust any settings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: MAPREDUCE-5069: adding concrete common implementations of CombineFileInputFormat
Hi map reduce developers, I would like to get your review and feedback on this JIRA and the patch. Thanks much! Regards, Sangjin On Thu, Mar 28, 2013 at 12:12 PM, Sangjin Lee sj...@apache.org wrote: Hi folks, I submitted a patch for adding some concrete implementations of CombineFileInputFormat. https://issues.apache.org/jira/browse/MAPREDUCE-5069 I would very much like your input and review on this patch, and if you guys find it good for commit please let me know how to proceed on this one. I'm an Apache committer and believe also a contributor for hadoop. Thanks in advance! Regards, Sangjin
[jira] [Reopened] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reopened MAPREDUCE-5083: MiniMRCluster should use a random component when creating an actual cluster --- Key: MAPREDUCE-5083 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: 2.0.5-beta Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, MAPREDUCE-5083-trunk.txt Currently all unit tests end up using the same work dir - which can affect anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved MAPREDUCE-5083. Resolution: Fixed Fix Version/s: (was: 2.0.5-beta) 2.0.4-alpha Target Version/s: (was: 2.0.5-beta) Release Note: Committed to branch-2.0.4. Modified changes.txt in trunk, branch-2 and branch-2.0.4 accordingly. MiniMRCluster should use a random component when creating an actual cluster --- Key: MAPREDUCE-5083 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: 2.0.4-alpha Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, MAPREDUCE-5083-trunk.txt Currently all unit tests end up using the same work dir - which can affect anyone trying to run parallel instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5128) mapred-default.xml is missing a bunch of history server configs
Sandy Ryza created MAPREDUCE-5128: - Summary: mapred-default.xml is missing a bunch of history server configs Key: MAPREDUCE-5128 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5128 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, jobhistoryserver Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza mapred-default.xml is missing many configs that work for the job history server. mapreduce.jobhistory.cleaner.enable, mapreduce.jobhistory.done-dir, and mapreduce.jobhistory.datestring.cache.size are a few examples. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5129) Add tag info to JH files
Billie Rinaldi created MAPREDUCE-5129: - Summary: Add tag info to JH files Key: MAPREDUCE-5129 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5129 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Billie Rinaldi Priority: Minor It will be useful to add tags to the existing workflow info logged by JH. This will allow jobs to be filtered/grouped for analysis more easily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5130) Add missing job config options to mapred-default.xml
Sandy Ryza created MAPREDUCE-5130: - Summary: Add missing job config options to mapred-default.xml Key: MAPREDUCE-5130 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5130 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza I came across that mapreduce.map.child.java.opts and mapreduce.reduce.child.java.opts were missing in mapred-default.xml. I'll do a fuller sweep to see what else is missing before posting a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira