What is the total size of your input dataset? Less than 1GB? Pig spawns 1 reducer for each gigabyte of input data.
-Prashant Kommireddi On Mon, Dec 5, 2011 at 11:53 PM, Ayon Sinha <[email protected]> wrote: > Hi, > I have this script whose stage 1 has n maps where n = # of input splits (# > gz files) but has 1 reducer. I need to understand why my script causes 1 > reducer. When I think about how I'd do it in Java MapReduce, I dont see why > there would be a single reducer in stage 1. > > register /home/ayon/udfs.jar; > > a = load '$input' using PigStorage() as (a:chararray, b:chararray, c:int, > d:chararray); > > g = group a by (a, b); > > g = foreach g { > x = order $1 by c; > generate group.a, group.b, x; > }; > > > u = foreach g generate myUDF($2) as triplets; > describe u; > dump u; > > Do you see any reason there should be 1 reducer at any stage? How do I > debug this? Where are the generated classes and plan? > > -Ayon > See My Photos on Flickr > Also check out my Blog for answers to commonly asked questions. >
