Hi Praveenesh,

You can use 'EXPLAIN' to understand what Pig is doing under the hood (MR
plan)
http://pig.apache.org/docs/r0.9.1/test.html#explain

What version of Pig and Hadoop are you using? I have never seen such a huge
difference between Java MR and Pig. At the time you ran Pig, was the
cluster idle or did you have other jobs running at the same time? Did you
make sure the job was not waiting on Map or Reduce slots being made
available?

Thanks,
Prashant

On Sun, Jan 15, 2012 at 9:47 PM, praveenesh kumar <[email protected]>wrote:

> Hey Guys,
>
> Is there anyway through which I can see the M/R jobs that pig runs
> internally for a given pig script ?
> I wanted to get unique values for a particular column.
>
> For that I wrote the following script:
>
> Data = Load 'Data.csv' using PigStorage(',');
> IDs = FOREACH Data GENERATE $0;
> UniqueID = Distinct IDs;
> Dump UniqueID;
>
> Is it the write/best way to get unique values of a particular column ?
>
> The reason why I am asking is, I ran the above script on my cluster, it
> took around 30 minutes to finish.
> However, for the same thing, when I wrote traditional java M/R code, it
> took only 10 minutes.
>
> So I want to see what Pig is doing internally.
> Can anyone tell what could be the reason for such behaviour ? How can I
> decrease Pig Execution time ?
>
> Thanks,
> Praveenesh
>

Reply via email to