I've seen a few threads about counters, PigStats, Elephant-Bird's stats utility class, etc.
http://www.mail-archive.com/[email protected]/msg00900.html http://www.mail-archive.com/user%40pig.apache.org/msg00034.html Has any progress been made on this or to provide a comprehensive stats/counter mechanism? What I'm looking to do is three-fold: 1) Get stats on the number of records that are filtered out when using the FILTER operation 2) Get stats on the number of records dropped/not loaded in a LOAD function (and actual copies of the records/rows from the file for later evaluation) 3) Output my own stats from a Pig job (without resorting to writing my own UDF and pushing things into PigStats using the Elephant-Bird utility) If any of this is possible, it would be great to see some examples or documentation. I would hate to go to raw Hadoop MR code just to get to counters. Thanks, Josh
