Thanks, I will explore the stats in MR mode a bit once I'm on 0.8/trunk. I will also have a look at wrapping some of the standard loaders to get better stats out of them. Is this of interest to anyone else? Should I submit back to PiggyBank?
This syntax of counters.PigStatusReporter, is that documented somewhere? Is it only on 0.8/trunk? What other variables do we have access to in the "native" Pig script other than "counters"? Josh On 17 October 2010 19:44, Dmitriy Ryaboy <[email protected]> wrote: > No on Filters (though every MR job tells you the number of records > ingested, > and the number returned, and as of 0.8 it also tells you which relations > were being produced in the job -- so you can sort of back into that). > EB sort of gives you 2), most of the loaders in there give you number of > malformed records, though they do not store the bad records anywhere. > I am not sure what you mean by 3) -- you can just increment > counters. PigStatusReporter.getInstance().getCounter(myEnum).increment(1L); > > (watch out for a null reporter when you are still in the client-side). > > -D > > > On Sat, Oct 16, 2010 at 2:28 PM, Josh Devins <[email protected]> wrote: > > > I've seen a few threads about counters, PigStats, Elephant-Bird's stats > > utility class, etc. > > > > http://www.mail-archive.com/[email protected]/msg00900.html > > http://www.mail-archive.com/user%40pig.apache.org/msg00034.html > > > > Has any progress been made on this or to provide a comprehensive > > stats/counter mechanism? > > > > What I'm looking to do is three-fold: > > > > 1) Get stats on the number of records that are filtered out when using > the > > FILTER operation > > 2) Get stats on the number of records dropped/not loaded in a LOAD > function > > (and actual copies of the records/rows from the file for later > evaluation) > > 3) Output my own stats from a Pig job (without resorting to writing my > own > > UDF and pushing things into PigStats using the Elephant-Bird utility) > > > > If any of this is possible, it would be great to see some examples or > > documentation. I would hate to go to raw Hadoop MR code just to get to > > counters. > > > > Thanks, > > > > Josh > > >
