Hi all,

Often, I will run one script on different data set. Sometimes small data set
and sometimes large data set. And different size of data set require
different number of reducers.
I know that the default reduce number is 1, and users can change the reduce
number in script by keywords parallel.

But I do not want to be bothered to change reduce number in script each time
I run script.
So I have an idea that could pig provide some API that users can set the
ratio between map task and reduce task. (and some new keyword in pig latin
to set the ratio)

e.g. If I set the ratio to be 2:1, then if I have 100 map tasks, it will
have 50 reduce task accordingly.

I think it will be convenient for pig users.


Jeff Zhang

Reply via email to