Builder pattern for Configuration and Job, and no static .setInputPaths(job,path)

2013-04-04 Thread Bjorn Jonsson
Hi all,

I have got some issues with having to use stuff like:
DistributedCache.addFileToClasspath()
FileInputFormat.setInputPaths()
FileOutputFormat.setOutputPaths()
Job.getInstance(conf)

What I want to do for very specific reasons is something that looks close
to this:

Configuration conf = Configuration
.with(mapred.job.tracker, 10.10.10.10:8021)
.with(fs.defaultFS, hdfs://10.10.10.10:8020)
.build();

Job job = Job
.withConfig(conf)
.withJarByClass(MyJob.class)
.withJobName(My MR Job)
.withMapperClass(MyMapper.class)
.withReducerClass(MyReducer.class)
.withMapOutputKeyClass(LongWritable.class)
.withMapOutputValueClass(Text.class)
.withOutputKeyClass(LongWritable.class)
.withOutputValueClass(Text.class)
.withLibJars(new Path(...))
.withInputPaths(new Path(...))
.withOutputPath(new Path(...))
.build();

job.waitForCompletion(true);

Is this something that has been considered? At least to get rid of the
static setInputPaths and .setOutputPath and put them on Job? I saw it that
way in demo code in the Javadoc for 2.0.3 but no implementation? Am I
missing something here?

Best,
Bjorn


[jira] [Created] (MAPREDUCE-5126) Add possibility to set a custom system classloader for mapred child processes, separate from mapred.child.java.opts

2013-04-04 Thread JIRA
Piotr Kołaczkowski created MAPREDUCE-5126:
-

 Summary: Add possibility to set a custom system classloader for 
mapred child processes, separate from mapred.child.java.opts
 Key: MAPREDUCE-5126
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5126
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Piotr Kołaczkowski
Priority: Minor


Some third party frameworks / systems based on Hadoop might want to set a 
custom classloader for loading classes of their jobs to better resolve 
conflicts with their libraries. 

While it is possible to set a custom classloader using the 
mapred.child.java.opts, this field is often overriden by users in their job 
configuration. So in order to change e.g. heap sizes the user would need to 
remember also to include the custom classloader property from the 
framework-defaults or otherwise he would break the framework.

This small patch introduces another parameter: mapred.child.java.class.loader 
that allows to set the classloader separately. This gives custom frameworks 
built on top of Hadoop more flexibility to supply their own classloader, 
without need to force users to adjust any settings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: MAPREDUCE-5069: adding concrete common implementations of CombineFileInputFormat

2013-04-04 Thread Sangjin Lee
Hi map reduce developers,

I would like to get your review and feedback on this JIRA and the patch.
Thanks much!

Regards,
Sangjin


On Thu, Mar 28, 2013 at 12:12 PM, Sangjin Lee sj...@apache.org wrote:

 Hi folks,

 I submitted a patch for adding some concrete implementations of
 CombineFileInputFormat.

 https://issues.apache.org/jira/browse/MAPREDUCE-5069

 I would very much like your input and review on this patch, and if you
 guys find it good for commit please let me know how to proceed on this one.
 I'm an Apache committer and believe also a contributor for hadoop. Thanks
 in advance!

 Regards,
 Sangjin



[jira] [Reopened] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster

2013-04-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reopened MAPREDUCE-5083:



 MiniMRCluster should use a random component when creating an actual cluster
 ---

 Key: MAPREDUCE-5083
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 2.0.5-beta

 Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, 
 MAPREDUCE-5083-trunk.txt


 Currently all unit tests end up using the same work dir - which can affect 
 anyone trying to run parallel instances.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5083) MiniMRCluster should use a random component when creating an actual cluster

2013-04-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah resolved MAPREDUCE-5083.


  Resolution: Fixed
   Fix Version/s: (was: 2.0.5-beta)
  2.0.4-alpha
Target Version/s:   (was: 2.0.5-beta)
Release Note: Committed to branch-2.0.4. Modified changes.txt in trunk, 
branch-2 and branch-2.0.4 accordingly.

 MiniMRCluster should use a random component when creating an actual cluster
 ---

 Key: MAPREDUCE-5083
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5083
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 2.0.4-alpha

 Attachments: MAPREDUCE-5083-branch2.txt, MAPREDUCE-5083-trunk_2.txt, 
 MAPREDUCE-5083-trunk.txt


 Currently all unit tests end up using the same work dir - which can affect 
 anyone trying to run parallel instances.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5128) mapred-default.xml is missing a bunch of history server configs

2013-04-04 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5128:
-

 Summary: mapred-default.xml is missing a bunch of history server 
configs
 Key: MAPREDUCE-5128
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5128
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation, jobhistoryserver
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza


mapred-default.xml is missing many configs that work for the job history 
server.  mapreduce.jobhistory.cleaner.enable, mapreduce.jobhistory.done-dir, 
and mapreduce.jobhistory.datestring.cache.size are a few examples.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5129) Add tag info to JH files

2013-04-04 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created MAPREDUCE-5129:
-

 Summary: Add tag info to JH files
 Key: MAPREDUCE-5129
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5129
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Billie Rinaldi
Priority: Minor


It will be useful to add tags to the existing workflow info logged by JH.  This 
will allow jobs to be filtered/grouped for analysis more easily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5130) Add missing job config options to mapred-default.xml

2013-04-04 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5130:
-

 Summary: Add missing job config options to mapred-default.xml
 Key: MAPREDUCE-5130
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5130
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sandy Ryza


I came across that mapreduce.map.child.java.opts and 
mapreduce.reduce.child.java.opts were missing in mapred-default.xml.  I'll do a 
fuller sweep to see what else is missing before posting a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira