What is the total size of your input dataset? Less than 1GB? Pig spawns 1
reducer for each gigabyte of input data.

-Prashant Kommireddi

On Mon, Dec 5, 2011 at 11:53 PM, Ayon Sinha <[email protected]> wrote:

> Hi,
> I have this script whose stage 1 has n maps where n = # of input splits (#
> gz files) but has 1 reducer. I need to understand why my script causes 1
> reducer. When I think about how I'd do it in Java MapReduce, I dont see why
> there would be a single reducer in stage 1.
>
> register /home/ayon/udfs.jar;
>
> a = load '$input' using PigStorage() as (a:chararray, b:chararray, c:int,
> d:chararray);
>
> g = group a by (a, b);
>
> g = foreach g {
>       x = order $1 by c;
>       generate group.a, group.b, x;
>       };
>
>
> u = foreach g generate myUDF($2) as triplets;
> describe u;
> dump u;
>
> Do you see any reason there should be 1 reducer at any stage? How do I
> debug this? Where are the generated classes and plan?
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>

Reply via email to