Re: How to see Pig MapReduce plan & classes

Prashant Kommireddi Tue, 06 Dec 2011 01:13:58 -0800

Also, check "pig.exec.reducers.bytes.per.reducer" which should be set to
1000000000 and "pig.exec.reducers.max " which should be set to 999 by
default.


If those are fine too, may be you could set "default parallel" or use the
PARALLEL keyword to manually set # of reducers.

Thanks,
Prashant

On Tue, Dec 6, 2011 at 1:07 AM, Prashant Kommireddi <[email protected]>wrote:

> What does the "HDFS_BYTES_READ" on JobTracker for this job say?
>
> -Prashant
>
>
> On Tue, Dec 6, 2011 at 12:59 AM, Ayon Sinha <[email protected]> wrote:
>
>> The total input path size is ~60GB. That is 1023 files of appx. 64MB
>> each. Total Map output bytes was 160GB. So why was there 1 reducer? Help me
>> understand.
>>
>> -Ayon
>> See My Photos on Flickr
>> Also check out my Blog for answers to commonly asked questions.
>>
>>
>>
>> ________________________________
>>  From: Prashant Kommireddi <[email protected]>
>> To: Ayon Sinha <[email protected]>
>> Cc: "[email protected]" <[email protected]>
>> Sent: Tuesday, December 6, 2011 12:26 AM
>> Subject: Re: How to see Pig MapReduce plan & classes
>>
>> Yes, when neither default parallelism nor PARALLEL is used Pig uses
>> "pig.exec.reducers.bytes.per.
>> reducer" to determine number of reducers. This is set to ~1GB -> which
>> means 1 reducer per ~1GB of input data.
>>
>> You can try hadoop fs -dus <filepath> and you would see the size is less
>> than 1GB.
>>
>>
>> On Mon, Dec 5, 2011 at 11:59 PM, Ayon Sinha <[email protected]> wrote:
>>
>> > I have 1023 gz files of < 64MB each.
>> > I think I see the reason in the log :(
>> >
>> > 2011-12-05 23:11:20,315 [main] INFO
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - Neither PARALLEL nor default parallelism is set for this job. Setting
>> > number of reducers to 1
>> >
>> > -Ayon
>> > See My Photos on Flickr <http://www.flickr.com/photos/ayonsinha/>
>> > Also check out my Blog for answers to commonly asked questions.<
>> http://dailyadvisor.blogspot.com>
>> >
>> >   ------------------------------
>> > *From:* Prashant Kommireddi <[email protected]>
>> > *To:* [email protected]; Ayon Sinha <[email protected]>
>> > *Sent:* Monday, December 5, 2011 11:56 PM
>> > *Subject:* Re: How to see Pig MapReduce plan & classes
>> >
>> > What is the total size of your input dataset? Less than 1GB? Pig spawns
>> 1
>> > reducer for each gigabyte of input data.
>> >
>> > -Prashant Kommireddi
>> >
>> > On Mon, Dec 5, 2011 at 11:53 PM, Ayon Sinha <[email protected]>
>> wrote:
>> >
>> > Hi,
>> > I have this script whose stage 1 has n maps where n = # of input splits
>> (#
>> > gz files) but has 1 reducer. I need to understand why my script causes 1
>> > reducer. When I think about how I'd do it in Java MapReduce, I dont see
>> why
>> > there would be a single reducer in stage 1.
>> >
>> > register /home/ayon/udfs.jar;
>> >
>> > a = load '$input' using PigStorage() as (a:chararray, b:chararray,
>> c:int,
>> > d:chararray);
>> >
>> > g = group a by (a, b);
>> >
>> > g = foreach g {
>> >       x = order $1 by c;
>> >       generate group.a, group.b, x;
>> >       };
>> >
>> >
>> > u = foreach g generate myUDF($2) as triplets;
>> > describe u;
>> > dump u;
>> >
>> > Do you see any reason there should be 1 reducer at any stage? How do I
>> > debug this? Where are the generated classes and plan?
>> >
>> > -Ayon
>> > See My Photos on Flickr
>> > Also check out my Blog for answers to commonly asked questions.
>> >
>> >
>> >
>> >
>> >
>>
>
>

Re: How to see Pig MapReduce plan & classes

Reply via email to