I have 1023 gz files of < 64MB each. 
I think I see the reason in the log :(

2011-12-05 23:11:20,315 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- Neither PARALLEL nor default parallelism is set for this job. Setting number 
of reducers to 1

 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
 From: Prashant Kommireddi <[email protected]>
To: [email protected]; Ayon Sinha <[email protected]> 
Sent: Monday, December 5, 2011 11:56 PM
Subject: Re: How to see Pig MapReduce plan & classes
 

What is the total size of your input dataset? Less than 1GB? Pig spawns 1 
reducer for each gigabyte of input data.

-Prashant Kommireddi


On Mon, Dec 5, 2011 at 11:53 PM, Ayon Sinha <[email protected]> wrote:

Hi,
>I have this script whose stage 1 has n maps where n = # of input splits (# gz 
>files) but has 1 reducer. I need to understand why my script causes 1 reducer. 
>When I think about how I'd do it in Java MapReduce, I dont see why there would 
>be a single reducer in stage 1.
>
>register /home/ayon/udfs.jar;
>
>a = load '$input' using PigStorage() as (a:chararray, b:chararray, c:int, 
>d:chararray);
>
>g = group a by (a, b);
>
>g = foreach g {
>      x = order $1 by c;
>      generate group.a, group.b, x;
>      };
>
>
>u = foreach g generate myUDF($2) as triplets;
>describe u;
>dump u;
>
>Do you see any reason there should be 1 reducer at any stage? How do I debug 
>this? Where are the generated classes and plan? 
>
>-Ayon
>See My Photos on Flickr
>Also check out my Blog for answers to commonly asked questions.
>

Reply via email to