Re: Could pig dynamic change the reduce number according the mapper task number ?

Scott Carey Thu, 12 Nov 2009 14:50:25 -0800

Is it possible to have a script at least use the default configured Hadoop 
value?  Or is there a way to do that already?

It won't be optimal, but it will probably be better than 1.

Also, having too many reducers used to be a big problem performance-wise, but 
Hadoop is getting a lot less sensitive to that over time.  Especially after the 
Shuffle refactoring in 0.21.
http://issues.apache.org/jira/browse/MAPREDUCE-318

So, in the future, over-estimating the number of reduces will likely be a 
better idea than under-estimating them.

On 11/12/09 8:25 AM, "Alan Gates" <[email protected]> wrote:

I agree that it would be very useful to have a dynamic number of
reducers.  However, I'm not sure how to accomplish it.  MapReduce
requires that we set the number of reducers up front in JobConf, when
we submit the job.  But we don't know the number of maps until
getSplits is called after job submission.  I don't think MR will allow
us to set the number of reducers once the job is started.

Others have suggested that we use the file size to specify the number
of reducers.  We cannot always assume the inputs are HDFS files (it
could be from HBase or something).  Also different storage formats
(text, sequence files, zebra) would need different ratios of bytes to
reducers since they store data at different compression rates.  Maybe
this could still work assuming, only in the HDFS case, with the
assumption that the user understands the compression ratios and thus
can set the reducer input accordingly.  But I'm not sure this will be
simple enough to be useful.

Thoughts?

Alan.

On Nov 12, 2009, at 12:12 AM, Jeff Zhang wrote:

> Hi all,
>
> Often, I will run one script on different data set. Sometimes small
> data set
> and sometimes large data set. And different size of data set require
> different number of reducers.
> I know that the default reduce number is 1, and users can change the
> reduce
> number in script by keywords parallel.
>
> But I do not want to be bothered to change reduce number in script
> each time
> I run script.
> So I have an idea that could pig provide some API that users can set
> the
> ratio between map task and reduce task. (and some new keyword in pig
> latin
> to set the ratio)
>
> e.g. If I set the ratio to be 2:1, then if I have 100 map tasks, it
> will
> have 50 reduce task accordingly.
>
> I think it will be convenient for pig users.
>
>
> Jeff Zhang

Re: Could pig dynamic change the reduce number according the mapper task number ?

Reply via email to