Ah, sorry, just saw that this should read: PigStatusReporter.getInstance() and there is no special counters keyword/variable. However is this common for Pig, being able to access static methods directly from within a Pig script?
Thanks, Josh On 18 October 2010 11:56, Josh Devins <[email protected]> wrote: > Thanks, I will explore the stats in MR mode a bit once I'm on 0.8/trunk. > > I will also have a look at wrapping some of the standard loaders to get > better stats out of them. Is this of interest to anyone else? Should I > submit back to PiggyBank? > > This syntax of counters.PigStatusReporter, is that documented somewhere? Is > it only on 0.8/trunk? What other variables do we have access to in the > "native" Pig script other than "counters"? > > Josh > > > > On 17 October 2010 19:44, Dmitriy Ryaboy <[email protected]> wrote: > >> No on Filters (though every MR job tells you the number of records >> ingested, >> and the number returned, and as of 0.8 it also tells you which relations >> were being produced in the job -- so you can sort of back into that). >> EB sort of gives you 2), most of the loaders in there give you number of >> malformed records, though they do not store the bad records anywhere. >> I am not sure what you mean by 3) -- you can just increment >> counters. >> PigStatusReporter.getInstance().getCounter(myEnum).increment(1L); >> >> (watch out for a null reporter when you are still in the client-side). >> >> -D >> >> >> On Sat, Oct 16, 2010 at 2:28 PM, Josh Devins <[email protected]> wrote: >> >> > I've seen a few threads about counters, PigStats, Elephant-Bird's stats >> > utility class, etc. >> > >> > http://www.mail-archive.com/[email protected]/msg00900.html >> > http://www.mail-archive.com/user%40pig.apache.org/msg00034.html >> > >> > Has any progress been made on this or to provide a comprehensive >> > stats/counter mechanism? >> > >> > What I'm looking to do is three-fold: >> > >> > 1) Get stats on the number of records that are filtered out when using >> the >> > FILTER operation >> > 2) Get stats on the number of records dropped/not loaded in a LOAD >> function >> > (and actual copies of the records/rows from the file for later >> evaluation) >> > 3) Output my own stats from a Pig job (without resorting to writing my >> own >> > UDF and pushing things into PigStats using the Elephant-Bird utility) >> > >> > If any of this is possible, it would be great to see some examples or >> > documentation. I would hate to go to raw Hadoop MR code just to get to >> > counters. >> > >> > Thanks, >> > >> > Josh >> > >> > >
