No on Filters (though every MR job tells you the number of records ingested, and the number returned, and as of 0.8 it also tells you which relations were being produced in the job -- so you can sort of back into that). EB sort of gives you 2), most of the loaders in there give you number of malformed records, though they do not store the bad records anywhere. I am not sure what you mean by 3) -- you can just increment counters. PigStatusReporter.getInstance().getCounter(myEnum).increment(1L);
(watch out for a null reporter when you are still in the client-side). -D On Sat, Oct 16, 2010 at 2:28 PM, Josh Devins <[email protected]> wrote: > I've seen a few threads about counters, PigStats, Elephant-Bird's stats > utility class, etc. > > http://www.mail-archive.com/[email protected]/msg00900.html > http://www.mail-archive.com/user%40pig.apache.org/msg00034.html > > Has any progress been made on this or to provide a comprehensive > stats/counter mechanism? > > What I'm looking to do is three-fold: > > 1) Get stats on the number of records that are filtered out when using the > FILTER operation > 2) Get stats on the number of records dropped/not loaded in a LOAD function > (and actual copies of the records/rows from the file for later evaluation) > 3) Output my own stats from a Pig job (without resorting to writing my own > UDF and pushing things into PigStats using the Elephant-Bird utility) > > If any of this is possible, it would be great to see some examples or > documentation. I would hate to go to raw Hadoop MR code just to get to > counters. > > Thanks, > > Josh >
