Re: PigStats usage

Richard Ding Thu, 16 Dec 2010 17:51:17 -0800

PigStats comes with Pig 0.8 (just released), this is probably why there is 
little material about it :)


You can use PigStats in two ways:

First, in a java program, you can invoke your Pig script through the new 
PigRunner API which takes the same argument as the Main class but returns a 
PigStats object. Some of the interesting stats you get are input/output 
records, the mapreduce job graph, the aliases, features, and counters 
associated with each job.

Second, Pig 0.8 added several interesting properties to the Hadoop Job XML file 
(in particular, a script id, and job parent ids).
So it is now easier for Pig users to correlate their Pig script with Hadoop Job 
tracker files. In addition, piggybank has a new loader (HadoopJobHistoryLoader) 
that is used to load Hadoop Job history files. After the files are loaded, 
users can use the power of Pig Latin to collect and analyze the Pig usage on 
their clusters.

PigStats is a new feature. Please let us know if you have any issues and 
suggestions.

Thanks
-- Richard







On 12/16/10 1:20 PM, "felix gao" <[email protected]> wrote:

Hi all,


My company uses pig a lot and I been looking for some examples on how to use
pigstats and there seems to be very little material about it. Can someone
point me to some useful references on how to use this and what are some of
the interesting stats that can be get out of it.


Thanks,

Felix

Re: PigStats usage

Reply via email to