Re: Could pig dynamic change the reduce number according the mapper task number ?

Alan Gates Thu, 12 Nov 2009 08:26:53 -0800

I agree that it would be very useful to have a dynamic number ofreducers. However, I'm not sure how to accomplish it. MapReducerequires that we set the number of reducers up front in JobConf, whenwe submit the job. But we don't know the number of maps untilgetSplits is called after job submission. I don't think MR will allowus to set the number of reducers once the job is started.

Others have suggested that we use the file size to specify the numberof reducers. We cannot always assume the inputs are HDFS files (itcould be from HBase or something). Also different storage formats(text, sequence files, zebra) would need different ratios of bytes toreducers since they store data at different compression rates. Maybethis could still work assuming, only in the HDFS case, with theassumption that the user understands the compression ratios and thuscan set the reducer input accordingly. But I'm not sure this will besimple enough to be useful.


Thoughts?

Alan.


On Nov 12, 2009, at 12:12 AM, Jeff Zhang wrote:

Hi all,
Often, I will run one script on different data set. Sometimes smalldata set
and sometimes large data set. And different size of data set require
different number of reducers.
I know that the default reduce number is 1, and users can change thereduce
number in script by keywords parallel.
But I do not want to be bothered to change reduce number in scripteach time
I run script.
So I have an idea that could pig provide some API that users can settheratio between map task and reduce task. (and some new keyword in piglatin
to set the ratio)
e.g. If I set the ratio to be 2:1, then if I have 100 map tasks, itwill
have 50 reduce task accordingly.

I think it will be convenient for pig users.


Jeff Zhang

Re: Could pig dynamic change the reduce number according the mapper task number ?

Reply via email to