What does the "HDFS_BYTES_READ" on JobTracker for this job say?

-Prashant

On Tue, Dec 6, 2011 at 12:59 AM, Ayon Sinha <[email protected]> wrote:

> The total input path size is ~60GB. That is 1023 files of appx. 64MB each.
> Total Map output bytes was 160GB. So why was there 1 reducer? Help me
> understand.
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>
> ________________________________
>  From: Prashant Kommireddi <[email protected]>
> To: Ayon Sinha <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Sent: Tuesday, December 6, 2011 12:26 AM
> Subject: Re: How to see Pig MapReduce plan & classes
>
> Yes, when neither default parallelism nor PARALLEL is used Pig uses
> "pig.exec.reducers.bytes.per.
> reducer" to determine number of reducers. This is set to ~1GB -> which
> means 1 reducer per ~1GB of input data.
>
> You can try hadoop fs -dus <filepath> and you would see the size is less
> than 1GB.
>
>
> On Mon, Dec 5, 2011 at 11:59 PM, Ayon Sinha <[email protected]> wrote:
>
> > I have 1023 gz files of < 64MB each.
> > I think I see the reason in the log :(
> >
> > 2011-12-05 23:11:20,315 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Neither PARALLEL nor default parallelism is set for this job. Setting
> > number of reducers to 1
> >
> > -Ayon
> > See My Photos on Flickr <http://www.flickr.com/photos/ayonsinha/>
> > Also check out my Blog for answers to commonly asked questions.<
> http://dailyadvisor.blogspot.com>
> >
> >   ------------------------------
> > *From:* Prashant Kommireddi <[email protected]>
> > *To:* [email protected]; Ayon Sinha <[email protected]>
> > *Sent:* Monday, December 5, 2011 11:56 PM
> > *Subject:* Re: How to see Pig MapReduce plan & classes
> >
> > What is the total size of your input dataset? Less than 1GB? Pig spawns 1
> > reducer for each gigabyte of input data.
> >
> > -Prashant Kommireddi
> >
> > On Mon, Dec 5, 2011 at 11:53 PM, Ayon Sinha <[email protected]> wrote:
> >
> > Hi,
> > I have this script whose stage 1 has n maps where n = # of input splits
> (#
> > gz files) but has 1 reducer. I need to understand why my script causes 1
> > reducer. When I think about how I'd do it in Java MapReduce, I dont see
> why
> > there would be a single reducer in stage 1.
> >
> > register /home/ayon/udfs.jar;
> >
> > a = load '$input' using PigStorage() as (a:chararray, b:chararray, c:int,
> > d:chararray);
> >
> > g = group a by (a, b);
> >
> > g = foreach g {
> >       x = order $1 by c;
> >       generate group.a, group.b, x;
> >       };
> >
> >
> > u = foreach g generate myUDF($2) as triplets;
> > describe u;
> > dump u;
> >
> > Do you see any reason there should be 1 reducer at any stage? How do I
> > debug this? Where are the generated classes and plan?
> >
> > -Ayon
> > See My Photos on Flickr
> > Also check out my Blog for answers to commonly asked questions.
> >
> >
> >
> >
> >
>

Reply via email to