Also, check "pig.exec.reducers.bytes.per.reducer" which should be set to 1000000000 and "pig.exec.reducers.max " which should be set to 999 by default.
If those are fine too, may be you could set "default parallel" or use the PARALLEL keyword to manually set # of reducers. Thanks, Prashant On Tue, Dec 6, 2011 at 1:07 AM, Prashant Kommireddi <[email protected]>wrote: > What does the "HDFS_BYTES_READ" on JobTracker for this job say? > > -Prashant > > > On Tue, Dec 6, 2011 at 12:59 AM, Ayon Sinha <[email protected]> wrote: > >> The total input path size is ~60GB. That is 1023 files of appx. 64MB >> each. Total Map output bytes was 160GB. So why was there 1 reducer? Help me >> understand. >> >> -Ayon >> See My Photos on Flickr >> Also check out my Blog for answers to commonly asked questions. >> >> >> >> ________________________________ >> From: Prashant Kommireddi <[email protected]> >> To: Ayon Sinha <[email protected]> >> Cc: "[email protected]" <[email protected]> >> Sent: Tuesday, December 6, 2011 12:26 AM >> Subject: Re: How to see Pig MapReduce plan & classes >> >> Yes, when neither default parallelism nor PARALLEL is used Pig uses >> "pig.exec.reducers.bytes.per. >> reducer" to determine number of reducers. This is set to ~1GB -> which >> means 1 reducer per ~1GB of input data. >> >> You can try hadoop fs -dus <filepath> and you would see the size is less >> than 1GB. >> >> >> On Mon, Dec 5, 2011 at 11:59 PM, Ayon Sinha <[email protected]> wrote: >> >> > I have 1023 gz files of < 64MB each. >> > I think I see the reason in the log :( >> > >> > 2011-12-05 23:11:20,315 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >> > - Neither PARALLEL nor default parallelism is set for this job. Setting >> > number of reducers to 1 >> > >> > -Ayon >> > See My Photos on Flickr <http://www.flickr.com/photos/ayonsinha/> >> > Also check out my Blog for answers to commonly asked questions.< >> http://dailyadvisor.blogspot.com> >> > >> > ------------------------------ >> > *From:* Prashant Kommireddi <[email protected]> >> > *To:* [email protected]; Ayon Sinha <[email protected]> >> > *Sent:* Monday, December 5, 2011 11:56 PM >> > *Subject:* Re: How to see Pig MapReduce plan & classes >> > >> > What is the total size of your input dataset? Less than 1GB? Pig spawns >> 1 >> > reducer for each gigabyte of input data. >> > >> > -Prashant Kommireddi >> > >> > On Mon, Dec 5, 2011 at 11:53 PM, Ayon Sinha <[email protected]> >> wrote: >> > >> > Hi, >> > I have this script whose stage 1 has n maps where n = # of input splits >> (# >> > gz files) but has 1 reducer. I need to understand why my script causes 1 >> > reducer. When I think about how I'd do it in Java MapReduce, I dont see >> why >> > there would be a single reducer in stage 1. >> > >> > register /home/ayon/udfs.jar; >> > >> > a = load '$input' using PigStorage() as (a:chararray, b:chararray, >> c:int, >> > d:chararray); >> > >> > g = group a by (a, b); >> > >> > g = foreach g { >> > x = order $1 by c; >> > generate group.a, group.b, x; >> > }; >> > >> > >> > u = foreach g generate myUDF($2) as triplets; >> > describe u; >> > dump u; >> > >> > Do you see any reason there should be 1 reducer at any stage? How do I >> > debug this? Where are the generated classes and plan? >> > >> > -Ayon >> > See My Photos on Flickr >> > Also check out my Blog for answers to commonly asked questions. >> > >> > >> > >> > >> > >> > >
