On Sat, Jun 23, 2012 at 3:30 AM, Sheng Guo <[email protected]> wrote: > I know it is automatically set. But I have a large data set, I want it > allocate more mappers during midnight so that more computing resource could > be used to speed up. > Any suggestions?
Pig uses CombineInputFormat by default which attempts to combine a set of physical input splits into one logical input split. I use the following setting to control the number of mappers in some of my benchmarking scripts: -- combine upto this many bytes into a composite input split, i.e., per mapper SET pig.maxCombinedSplitSize 250000000; Note that your are absolute min. is constrained by the smallest block size in your input set.
