Thanks, I will explore the stats in MR mode a bit once I'm on 0.8/trunk.

I will also have a look at wrapping some of the standard loaders to get
better stats out of them. Is this of interest to anyone else? Should I
submit back to PiggyBank?

This syntax of counters.PigStatusReporter, is that documented somewhere? Is
it only on 0.8/trunk? What other variables do we have access to in the
"native" Pig script other than "counters"?

Josh


On 17 October 2010 19:44, Dmitriy Ryaboy <[email protected]> wrote:

> No on Filters (though every MR job tells you the number of records
> ingested,
> and the number returned, and as of 0.8 it also tells you which relations
> were being produced in the job -- so you can sort of back into that).
> EB sort of gives you 2), most of the loaders in there give you number of
> malformed records, though they do not store the bad records anywhere.
> I am not sure what you mean by 3) -- you can just increment
> counters. PigStatusReporter.getInstance().getCounter(myEnum).increment(1L);
>
> (watch out for a null reporter when you are still in the client-side).
>
> -D
>
>
> On Sat, Oct 16, 2010 at 2:28 PM, Josh Devins <[email protected]> wrote:
>
> > I've seen a few threads about counters, PigStats, Elephant-Bird's stats
> > utility class, etc.
> >
> > http://www.mail-archive.com/[email protected]/msg00900.html
> > http://www.mail-archive.com/user%40pig.apache.org/msg00034.html
> >
> > Has any progress been made on this or to provide a comprehensive
> > stats/counter mechanism?
> >
> > What I'm looking to do is three-fold:
> >
> > 1) Get stats on the number of records that are filtered out when using
> the
> > FILTER operation
> > 2) Get stats on the number of records dropped/not loaded in a LOAD
> function
> > (and actual copies of the records/rows from the file for later
> evaluation)
> > 3) Output my own stats from a Pig job (without resorting to writing my
> own
> > UDF and pushing things into PigStats using the Elephant-Bird utility)
> >
> > If any of this is possible, it would be great to see some examples or
> > documentation. I would hate to go to raw Hadoop MR code just to get to
> > counters.
> >
> > Thanks,
> >
> > Josh
> >
>

Reply via email to